Alerts – Working with Left Menu

What are Alerts

Cavisson products have an in-build support of proactive alerting through which users can get notified whenever a KPI (Key Performance Indicator metrics like CPU utilization, request per second, average response time etc.) breaches the threshold configured as part of alert rules. This allows users to get an early notification of performance degradation even before an actual issue happens.

Types of Alerts

There are following types of alerts:

Capacity Alerts

In this type of alert, alert generation is done based on a fixed threshold value. User needs to define a Rule and Condition with threshold value. If condition is met, then the alert is generated.

Behavior Alerts

In case of Behavior alert, alerts are generated based on % deviation from the auto-generated baseline trends. The purpose of dynamic threshold is to:

  • Provide adaptive threshold functionality for Alert.
  • Generate alerts based on Identifying baseline on current load at specific hour of a day.
Load Index based Alerts

This is a highly advanced type of baseline alert, which is unique to Cavisson NetDiagnostics alert framework and it works based upon the load on the system instead of time-based trend. In this baseline, the alert engine learns the system behavior on all the loads and then utilizes this learning to compare the current data at current load with the baseline value at current load. Upon closely monitoring an application behavior, it is seen that the response time of an application is proportional to the load (PVS – page views per second) on the system. For example, if the load (page views per second) is 10, and let us assume that the response time is 100 ms, then, if load becomes 20 PVS, then the response time also becomes high; let’s say it goes to 200ms and so on. Therefore, if on one particular time, the value of response time goes to 500ms on load of 20 PVS then ideally there must be an issue and it should generate an alert.

Alert Properties

Following alert properties are available:

Alert Actions

These refer to the actions associated with an alert, which are executed on a certain trigger. Mainly alert actions are divided in to 3 categories:

  • Notification Actions like sending e-mails or SMS or sending SNMP Traps. Besides this, there are multiple extensions available to send notification to third-party programs like ServiceNow, Cisco Spark, and PagerDuty.
  • Diagnostics actions like taking TCP Dump, Thread Dump, and Heap Dump.
  • Remediation action like running some custom scripts to restart a server/instance upon consuming high CPU/Memory.
Alert Settings

Alert settings are used to configure various parameters required for the configuration of alerts.

Alert Policy

Alert Policies enable a user to take different actions when an alert condition occurs and works as link between alert rules and alert actions. User can configure one or more policies with different actions. Policies are associated with Selected or All alert rules configured.

Alert Maintenance Window

Alert maintenance window configuration is required to disable alert generation at the time of maintenance period or a scheduled down-time.

Alert Rules

Alert rule is the key element where user defines the metric and associated threshold to generate an alert for a specific severity.

Alert History

Alert history is useful in obtaining insights into past-generated alerts. Alert history is also used to understand how severity of specific alerts may have changed over a period of time. The detailed description is provided in the subsequent sections.

Alert Actions

These refer to the actions associated with an alert, which are executed on a certain trigger. Mainly alert actions are divided in to 3 categories:

  • Notification Actions like sending e-mails or SMS or sending SNMP Traps. Besides this, there are multiple extensions available to send notification to third-party programs like ServiceNow, Cisco Spark, Slack, and PagerDuty.
  • Diagnostics actions like taking TCP Dump, Thread Dump, and Heap Dump.
  • Remediation action like running some custom scripts to restart a server/instance upon consuming high CPU/Memory.

Create an Alert Action

To create alert actions, follow the below mentioned steps:

  1. Go to Alert Actions that is displayed in the Alert window.

  1. Click the Add button to add an alert action. The Add Action window is displayed.

  1. Specify the name of the action. There are various sections under this, such as Notification, Diagnostics, and Remediation.

Notification

  1. Under Notification section, select the notification type either email, SMS, or SNMP and specify the details.

Email

  • Turn on the Send an Email toggle key.
  • In the Email Receiver box, type the valid email ID of the receiver.
  • Specify the mail subject by clicking the Add icon. Clicking the Add icon displays the following variables, from which a user can select any number of variables.

  • Similarly, specify the Pre Text and Post Text. Pre Text is the text that is displayed before the email body, and Post Text is the text that is displayed after the email body.

  • Click the Advanced E-Mail Settings icon for more settings.
    • Turn on the Show charts in E-Mail toggle key. This is turned off by default.
    • Specify the maximum number of charts, type of chart (line, bar or area), and the duration of each graph (in minutes).
    • Specify the criteria to select chart metrics. To do this, a user can choose only one from the following options:
      • Use metrics on which alerts are generated: Turn on this toggle key to use the metrics on which alerts are generated. This is turned on by default.
      • Identify and show more relevant metrics where similar pattern is matched: Turn on this toggle key to apply the pattern match. A user needs to specify a catalogue, which contains the graphs information on which the user can apply the pattern match. To apply pattern match, the user needs one baseline with which the user can compare all the graphs available in catalogue. To get the baseline graph (for which alert is generated), the user needs some criteria and it should have the minimum or maximum value across all the indices of one condition for which the alert is generated. Once the baseline is selected, the user can compare each sample of both the graphs (baseline as well as the catalogue graphs) and filter all the graphs, which satisfy the threshold value given in the action. After that, the user can send all the resultant graphs in mail body along with all the alert information. Below is a sample of such email:

  • Place all the metrics in separate charts: Turn on this toggle key to place all the metrics in separate charts. This is turned off by default, in which case all the metrics are merged together and displayed in a single chart.
  • Click Save to save the advanced email settings.

Now, the user can view charts for the metrics on which alert has been generated, as well as user can also see the metrics, which are matching the same pattern as the alert metric, to find out some clue about the root cause. A user can now provide a maximum of 150 charts in alert email and graph duration can be added up to 1440 minutes. If user tries to provide inputs beyond the specified limits, an alert message is displayed on saving and do not allow to save the values.

Whenever an alert is generated, a tabular report is sent in alert email as a link and an attachment. This helps the user to reduce manual processing time to analyze an alert. The report can also be generated by using a template.

Note: If a graph has NAN samples and alert is configured for only one value, and then if only one value is obtained, the user can view the graph in the alert chart instead of a blank graph.

SMS

  • Turn on the Send an SMS message toggle key.
  • Type the phone number in SMS Receiver to which the SMS message needs to be sent.

SNMP

  • Turn on the Send SNMP Traps toggle key. The user can also enable or disable the Consolidated notification for a rule.
  • Specify the SNMP server and SNMP port.
  • Specify the SNMP version. There are three versions:
    • v1: On selecting this, specify the community to which the alert is to be sent.
    • v2c: On selecting this, specify the community to which the alert is to be sent.

  • v3: On selecting this, SNMP v3 security level section is displayed. This section contains three options:
    • NO_AUTH_NO_PRIV: Authentication protocol and Privacy protocol not required. On selecting this, specify the username.

    • AUTH_NO_PRIV: Authentication protocol required, but Privacy Protocol not required. On selecting this, specify the username, select the authentication protocol, and provide the authentication password.

    • AUTH_PRIV: Both Authentication protocol and Privacy protocol are required. On selecting this, specify the username, select the authentication protocol, and specify the authentication password. Also, select the privacy protocol, and provide the password.

Send Alert to Extension Select the extension to which the alert is to be sent. Extensions are groups of packages and classes that will be bundled in single jar and make it available to the user when need arises. Examples of extensions are – service now, pager duty, Slack, and cisco spark. Forward alerts to a central Dashboard This feature enables forwarding of alerts from one Cavisson product to other Cavisson product, so that all the alerts are visible at one central dashboard. This is useful in the scenarios where user wants to see multiple products’ alerts in a single dashboard. For example: In one environment there are multiple Cavisson products configured, let’s say NetVision (NV) and NetDiagnostics (NDE) and there is a requirement of sending all NV alerts automatically to NDE so that all the alerts generated from both the products are visible at one place. This feature enables:

  • Alert integration added for sending alerts to other Cavisson product(s).
  • Sending alerts from multiple products to a single Dashboard (many to one).

Below is an example of Slack integration.

Diagnostics

This section contains sub-sections for configuration of thread dump, heap dump, and TCP dump.

Thread Dump

  1. To take a thread dump, turn on the Take a Thread dump toggle key and specify the number of thread dumps along with the interval. Thread dump is applicable for all applications other than node.js.

Heap Dump

  1. To take a heap dump, turn on the Take a Heap dump toggle key and specify the number of heap dumps along with the intervals. Heap dump is applicable for all types of applications.

CPU Profiling

  1. CPU profile is used to identify how the virtual machine (VM) has been spending its time when executing a method. To enable CPU profiling, turn on the CPU Profiling toggle key and specify the duration.

TCP Dump

8. To take a TCP dump, turn on the Take a TCP dump toggle key and specify – Interface, Max Duration, Size, Number of Packets, and Port. Mention the additional attributes, if required.

Remediation

9. To run a script or an .exe file on problematic nodes, turn on the Run a script or executable on problematic Nodes toggle key and type a script name. To run this on server, turn on the Run on server toggle key. 10. After the configuration, click Save.

Advanced Configuration This feature is used for sending location in Alert Mail, SNMP Trap, and Cisco Spark. Regex is used to extract locations from Alert vectors. The user needs to provide pattern to fetch Indices Identifier ($INDICES_ID) and then test the provided indices identifier pattern on a string. After that, click the Test button to identify the results after applying pattern on the provided string. Specify the indices identifier in the Subject Mail section along with the pre text and post text. The email is generated in the below format. When a user applies pattern matching, the Alert Mail, which is generated contains the following information:

  • Pattern Matching Threshold
  • Catalogue Used
  • Maximum number of Graphs

Note: In tier-wise alert rule, for relational operators, such as >=, <=, >, <, = or != then ‘Baseline Value’ column is not displayed in Alert Mail and ‘Severity’ column is not displayed for Tier Wise Alert. Now, the Product Dashboard URL is sent with preselected time frame in alert message. Therefore, when a user hits a URL that is sent via alert email, SNMP trap, and alert extension, only those graphs are displayed for which the alert is generated. Earlier, this used to open the default favorite and the user was required to plot the graphs (where the alert is generated) manually. This feature saves the user’s time and enables to view the relevant graphs automatically. URL Sent via Alert Email Graph Where Alert is Generated

Import and Export

The Import and Export features are also provided for Alert Action, which were earlier limited to Alert Rule, Alert Maintenance, and Alert Settings. This feature is used for exporting Alert Action and Policy from one machine and import it in another machine. It reduces the effort of adding same policy and action in machine. Import: This feature is used to import data (policy and action) from server or the local machine. The changes can be viewed immediately after refreshing the UI. Export: This feature is used to export data (policy and action) to local or other machines. Then, it can be imported to other machines to see the changes. Export an Alert Action A user can export one or more alert actions to server or to a local machine. To export an alert action, follow the below mentioned steps:

  1. Select an alert action and click the Export button.

2. This displays the Export window and prompts the user to export to local or to server. Export to Server

  1. To export the alert action to server, select the Server option and click OK. This prompts the user for server IP and path.

  • Server IP: Specify the IPV4 type IP address of another machine where to export the alert action.
  • Path: Specify the path of another machine where to export the alert action.

4. Click OK. This exports the data at the specified server location and a confirmation message is displayed. Export to Local

  1. To export alert action to local, select the Local option and click OK.

6. This exports the file to local machine and a confirmation message is displayed. Import an Alert Action A user can import one or more alert actions from server or from a local machine. To import an alert action, follow the below mentioned steps:

  1. Click the Import  button.This prompts the user to import the alert action from server or from a local machine.

Import from Server

  1. Select the From Server option and click Choose.

3.Browse the file from the server and click Select. 4. This displays the added alert action in the Import window. 5. Click OK. A confirmation message is displayed for success import. Import from Local 6. Select the From Local option and click Choose. 7. Select the file from local machine. The added file is displayed in the Import window. 8. Click Upload. A confirmation message is displayed for successful upload of file. 9. Click OK. A confirmation message is displayed that alert action is imported successfully. Note: Import and Export operations can also be performed in Alert Policy, Alert Rule, Alert Maintenance, and Alert Settings.

Alert Settings

Alert settings are used to configure various parameters required for the configuration of alerts.

  1. Click the Alert Settings sub-menu within the Alerts menu. The following window is displayed.

Alert Settings window consists of following sections:

Rule Triggering Alert Settings

  • Enable/Disable Alerts: Use the toggle button to enable/disable the capturing of alerts. On disabling, user can remove all active alerts. Once enabled, user can further enable capacity alert. On disabling, the user can remove all active alerts or both. On enabling Alert Policy toggle, user can enable/disable the alert notifications (e-mail, SMS, SNMP trap, and Extension), Diagnostics (thread dump, heap dump, TCP dump, and CPU profiling), and remediation (run script).
  • Enable / disable prediction alerts: Predictive alerting provides predictions of certain events or inputs. It is related to machine learning because the technology is able to learn from the data it is regularly processing and based on its learning, it makes predictions and generate alert. Once alert is generated then alert engine perform action based on policy. Predictions alerts are based on some trends. Use the toggle button to enable / disable prediction alerts. Upon disabling, user can remove all prediction alerts.
  • Enable Alert Policy: Here, the user can enable disable alert notification (E-mail, SMS, SNMP trap, extension), diagnostics (thread dump, heap dump, tcp dump, CPU profiling, and Java flight recorder), and remediation (run script).
  • Enable Maintenance Window: To enable/disable the alert maintenance window, use the toggle button.
  • Enable Alert History Logging: To enable / disable the saving of alert history logs in the log file.
  • Minimum Baseline value criteria for Behavior Alerts generation: This is used to restrict behavior alert generation. The alert is generated only when baseline value is greater than this value.
  • Skip number of samples on session restart: Specify the number of samples that need to be skipped upon restart of a session. For that number of samples, alerts are not generated. This skips sample of particular duration only for new scaled VMs on rule level.
  • Selection of graph time and view by: Here, user can specify the minimum graph time (in minutes) to view the graph from the time when an alert is generated. User can also specify the view of the samples by using the ‘view by’ option. This is applicable when user uses the ‘Show graphs’ option within the active alerts and alert history.

Note: The system saves same configurations across all the DCs in case a user tries to save Alert Settings for ‘All’ or individual DC. If any individual DC is selected, saving the Alert Settings impact only on the selected DC and get the configuration from the selected DC that is reflected in UI. If DC is selected as ‘All’, all the changes done by user is saved on all the configured DCs. However, the configuration that is being displayed in the UI is of Master DC.

Rest API Triggered Alert Settings

To enable/disable Rest API triggered alerts, use the toggle button. User can specify the time after which the system clears all the Rest API triggered alerts. On disabling the toggle, user can remove all active Rest API triggered alerts.

Debug Settings

Here, user can configure the following:

  • Debug log: A debug level is a set of log levels for debug log categories. Value range from 0 to 4
  • Profiling log: Value range from 0 to 4
  • Baseline view format: Either basic or extended. In ‘Basic’ view only average/sum values is displayed. In ‘Extended’ view average/sum with count is displayed.

Email/SMS Settings

This section is used to configure the email/SMS settings. To open this, click the Email/SMS Settings button at the top-right corner of the window. Email/SMS settings can also be configured from the top menu.

  

This displays the Email/SMS Settings window: Provide the required details related to mail configuration and SMS career. Test the configuration and click the Apply button to apply the settings. User can import / export alert settings using the Import / Export buttons.

Alert Maintenance

Alert maintenance window configuration is required to disable alert generation at the time of maintenance period or a scheduled down-time. Here, user can add a maintenance schedule and can view the applied maintenance schedules. In addition, user can search or delete a maintenance schedule based on the requirements.

Configure Maintenance Schedule

  1. Select first indices level (i.e. tier). Upon selecting ‘All’, maintenance schedule is applicable for all tiers. Upon selecting pattern, user needs to specify a pattern for the tier(s) to which maintenance schedule is to be applied.
  2. Select second indices level (i.e. server). Upon selecting ‘All’, maintenance schedule is applicable for all servers. Upon selecting pattern, user needs to specify a pattern for the server(s) to which maintenance schedule is to be applied.
  3. In the same manner, select the third indices (i.e. instance).

User can view the further elements of each level of indices by clicking the Test button. For example, if user selects pattern as *Cav at first indices, and upon clicking the Test button, the details are displayed based on the selected pattern. This functionality can be used at any indices level based on the pattern or for specific tier/ server/ instance. Note: If a user selects single Tier, the server level displays the details for that tier, and if pattern is applied, the indices or server list is displayed based on the applied pattern. When the user selects a tier, only that tier gets selected despite of other tiers starting with the same name. 4. Then, select the schedule type from the following list of options:

  • Every Day of Every Month: This maintenance schedule is meant for every day of every month. Select the starting and ending hour at which the maintenance schedule is applicable for each day of each month.

  • Day of Every Month: This maintenance schedule is meant for a particular day of every month. First, select the day from the list and then select the starting and ending hour at which the maintenance schedule is applicable for the selected day of each month.

  • Last Day of Every Month: This maintenance schedule is meant for last day of every month. Select the starting and ending hour at which the maintenance schedule is applicable for last day of each month.

  • Weekday of Every Month: This maintenance schedule is meant for a particular weekday of every month. First, select the week, then day, and then starting and ending hour at which the maintenance schedule is to be applied.

  • Day of Every Year: This maintenance schedule is meant for a particular day of a month. First, select the month, then day, and then starting and ending hour at which the maintenance schedule is to be applied.

  • Weekday of Every Year: This maintenance schedule is meant for a particular weekday of a year. First, select the week, then weekday, then month, and then the starting and ending hour at which the maintenance schedule is to be applied.

  • Custom: This maintenance schedule is meant for a custom duration. Select the starting date and time and ending date at time at which the maintenance schedule is to be applied.

  1. Provide a description for the maintenance schedule and click Add. The system prompts to apply the maintenance schedule as soon as rule is applied. Click Yes.
  2. This adds the alert maintenance schedule and displays in the Applied Maintenance Schedule section.

Other Operations

  • To view maintenance window history, enable the Show Maintenance Window History toggle button.
  • To use the advance filters, click the  icon.
  • To delete the selected maintenance schedule and make the schedule in-effective, first select a record and click the  icon.
  • To delete the selected maintenance schedule and make the schedule effective, first select a record and click the  icon.
  • To download the report in Word, Excel, and PDF format, select the corresponding icons.

Alert Policy

Alert Policies enable a user to take different actions when an alert condition occurs. User can configure one or more policies with different actions. Policies are attached to Selected or All alert rules configured. Go to Alert Policy that is displayed in the Alert window. There are following sections within this:

  • Policy Name: Name of the alert policy
  • When to Trigger: The policy is triggered when generated alert satisfied the given criteria. That means, if the severity of the alert gets changed from Critical to Major or Critical to Minor, then only the policy gets triggered.
  • Enabled: Is the policy enabled / disabled .
  • Action(s) to Trigger: Actions to be executed based on trigger, such as taking heap dump, taking thread dump, alert notification via email etc.

Working with Policy A user can perform following actions with a policy:

  • Add a policy: To add a policy, click the Add 102 button.
  • Edit a policy: To edit a policy, select the policy and click the Edit 103 button.
  • Delete a policy: To delete a policy, select the policy and click the Delete 104 button.
  • Copy a policy: To copy a policy, select the policy and click the Copy 105 button
  • Enable a policy: To enable a disabled policy, select the policy and click the Enable 106 button.
  • Disable a policy: To disable an enabled policy, select the policy and click the Disable 107 button.

Add a Policy

  1. Click the Add button on the Alert Policy window. The Add Policy window is displayed.

  1. Type the policy name. To enable it, select the Enable Policy checkbox.
  2. If Applicable Only for Predictive Alert option is enabled, the policy works for only prediction alerts.
  3. AlertMail and SNMP Trap can be sent when Alert is generated using REST API. For that, turn on the Applicable Only for Rest API toggle key. If the user turns it off, policy will be applicable for both – Alerts and Alert through Rest API.
  4. In the Policy Events section, turn on the Enable/Disable all criteria toggle key.
  5. Select the intensity of the alert rule, such as Critical, Major, and Minor for both starting and ending the rule violation.
  6. Specify the alert rule – Behavior/Capacity/All. To specify alert rules, select Specified Alert Rule(s).
  7. Click Action.

Add Actions to Execute

This has already been covered in Alert Actions section.

Alert Rules

Alert rule is based on the alerts defined if a condition is met. To view the Alert Rule window, go to Alert menu and click the Rules menu item. The Alert Rules window is displayed. There are following columns:

  • Status: Represents whether the Rule is enabled/disabled.
  • Rule Type: Represents whether the rule is configured for tier level or individual indices level.
  • Rule Name: Name of the rule.
  • Condition Expression: Displays the summarized view of the conditions. Upon mouse hover, the detailed condition is displayed.
  • Alert Message: Message to be displayed when an alert is encountered.
  • Alert Description: Description of the alert to understand the cause and other insights.
  • Recommendation: Specify preventive measures for improvement and overcome from the reason of alert generation.

There are following actions a user can apply on rules:

  • Edit: To edit a rule
  • Add: A add a new rule
  • Delete: To delete a rule
  • Update: To update a rule
  • Copy: To copy a rule
  • Close: To close the Alert rule window

Creating an Alert Rule

User can create a rule based on the Tier level or individual indices level. For both the levels, there is different configurations. We will describe configuration for both the levels. To create an alert rule, follow the below mentioned steps:

  1. Click the Add button on the Alert Rule This displays Alert Rule Configuration window.

2. Specify the rule name. To enable it, select the Enable Rule check box.

3. Specify the moving window timelines. For moving window advance settings, click the settings  icon. This displays the Moving Window Advanced Settings window.

4. In this window, there are following options for calculating graph data value:

  • Moving Average: Graph data value for alert is calculated on each new sample generated.
  • Fixed Window Average: Graph data value for alert is calculated for fixed time as specified.
  • Moving Advanced: Graph data value for alert is calculated for last / any stated percentage of samples individually with specified rule condition.

User can also enable checking of logs for a specified duration if alert remains in the same state. Also, initiate action when ‘Alert rule continues in the same state’ flag is enabled in the policy.

  1. To enable alerts for business health rules, select the Business Health Rule check box.

Applying Rule at Tier Level

Here, user can apply rule at tier level and alert is generated on the overall results of the tier based on the conditions applied.

  1. First, select the Tier radio button and then select the tier from the drop-down list. User can also apply a pattern for selecting the tiers from the list.
  2. Specify whether to consider all conditions or any of the condition (for the alert generation) by selecting ‘Every’ or ‘Any’ respectively.
  3. Provide the condition name.
  4. Select for which value (such as average, sample count, sum, minimum, and maximum) the alert condition is to be configured.
  5. Then, select the metric group and metric on which the condition is to be applied.
  6. Specify the condition and its threshold value. There could be one of the following conditions:

Comparing with Baseline Data: Absolute value / percentage specified for the alert condition is compared with the selected trend of the Baseline data.

  • Increases from Baseline: If the current value is increased with the specified value / percentage as compared to the baseline data for that particular trend, then alert will be generated. User can also specify the minimum increase from the baseline.
  • Decreases from Baseline: If the current value is decreased with the specified value / percentage as compared to the baseline data for that particular trend, then alert will be generated. User can also specify the maximum increase from the baseline.
  • Changes from Baseline: If the current value is changed (either increased or decreased) with the specified value / percentage as compared to the baseline data for that particular trend, then alert will be generated. User can also specify the minimum change from the baseline.

Comparing with Current Data: Absolute value / percentage specified for the alert condition is compared with the current data.

  • Increases: If the current value is increased with the specified value / percentage, then alert will be generated. User can also specify the minimum increase value / percentage.
  • Decreases: If the current value is decreased with the specified value / percentage, then alert will be generated. User can also specify the minimum decrease value / percentage.
  • Changes: If the current value is changed (either increased or decreased) with the specified value / percentage, then alert will be generated. User can also specify the minimum change value/ percentage.
  • Greater than Equals to: If the current value is greater than and equals to with the specified value, then alert will be generated.
  • Less than Equals to: If the current value is less than and equals to with the specified value, then alert will be generated.

User can also notify alert engine to consider the condition as true when data is not present for a common indices by selecting the Consider as true when data is not present for a common indices check box.

Adding Multiple Conditions

User can add multiple condition and specify whether to consider ‘all’ or ‘any’ condition for the alert generation.

  • To add a new condition, click the Add  icon at the top-right corner. This displays the place holders for the new condition. User can either provide the configuration details manually or just copy the configuration from any / previous condition and edit later.
  • To copy the same configuration from any condition, click the Copy  icon from the condition whose configuration is to be copied and click the Paste  icon corresponding to the condition where the configuration is to be copied.
  • To delete a condition, user can click the Delete  icon.

Alert Severity

In this section, user can configure the alert severity based on the indices affected if the specified conditions are satisfied. The severity can be specified either for:

  • % of Indices: If the condition is met, user can define the severity (such as critical, major, and minor) based on the percentage of affected indices. For example, if 12 indices are affected out of 20 then it means 60% indices are affected. For severity configuration, if we specify critical for 80%, major for 60%, and minor for 40% then major alert will be generated in this case as the affected indices are 60%.
  • Number of Indices: If the condition is met, user can define the severity (such as critical, major, and minor) based on the number of affected indices.
  • Aggregate value of Indices: If the condition is met, user can define the severity (such as critical, major, and minor) based on the aggregate value of indices.
  • Individual Indices: If the condition is met, user can define the severity (such as critical, major, and minor) based on the individual indices.
  1. Specify the Alert message that is to be displayed when the alert is generated.
  2. Specify the Alert description that provides an insight about the generated alert.
  3. Specify the recommendation that is to be followed to overcome from the alert in production environment.
Applying Rule at Indices Level

Here, user needs to specify the condition at indices level corresponding to a severity. User can add multiple conditions for each severity.

  1. Select the Individual Indices level radio button.
  1. Specify the condition for Critical severity with the same process as defined in the Tier level alert configuration. In the same way, specify condition(s) for other severity, such as Major and Minor. User can apply multiple conditions for any severity and can mention whether to consider each or any condition for a particular severity.
  1. For adding a condition, user needs to provide the condition name, value on which the condition is to be applied (average, sample count, sum, minimum, or maximum), metrics group, metrics, and indices.
  1. For indices selection, user is having multiple options that are displayed upon clicking the Select Indices link.

Here user can have following options for indices selection:

  • All: Condition is applied on all indices.

  • Specified: Condition is applied only on the selected indices. Select the indices by using the  button.

  • Advanced: This is an advanced level indices selection. User can also specify pattern for the tier, server, and instances.

  1. Specify the Alert message, alert description, and recommendation.

Using the Value Generated from Graph A as a Threshold for Graph B

An alert is generated on a metric by comparing with threshold value of another metric based on some conditions. A new condition ‘Percentage of’ is used to select a threshold graph. This is applicable for Increases, Decreases, Changes, Increases from Baseline, Decreases from Baseline, and Changes from Baseline condition.

Alert Value in Active Alert

Baseline Data Viewer

Alert baseline viewer is used to view the baseline data condition-wise. User can view this if the condition is behavior type. Click the  icon at the top-right corner of the window to view the alert baseline. This displays the Baseline Data Viewer window: User can download the baseline data in Word, Excel and PDF format.

Import / Export Alert Rule

User can import / export an alert rule to/from a server or local based on the requirements. To do this, follow the below mentioned steps: To import a rule, click the Import button. This displays Import Window popup where user is having two options, such as ‘From Server’ and ‘From Local’ to import a rule.

Import Alert Rule from Server

To import an alert rule from server, follow the below mentioned steps:

  1. Select From Server option and click Choose.

This displays external data file manager window. Select the required path / folder from this window where to import the alert rule.

Import Alert Rule from Local

To import an alert rule from local, follow the below mentioned steps:

  1. Select From Local option and click Choose.

2. Select the alert rule file (containing the alert rule configurations) which is to be imported.

Export Alert Rule to Server

To export alert rule to server, follow the below mentioned steps:

  1. Select an alert rule to be exported and click the Server option.

2. Provide the Server IP and Path configurations.

Download Report

The user can download the alert rule report. A user can view both global and custom filters applied in the downloaded report in all alert data tables.

Applied Filter

Downloaded Report

Baseline

Baseline is a dynamic threshold that is derived from historical trend. A baseline can move or change over a period of time based on the current trend. A user can configure the trend-timeline to be considered to create a baseline. Behavior alerts work on comparing current data points (metric data) with the baseline data and upon it’s breach a behavior alert is triggered.

Baseline Trends

Normally, web applications do not have constant flow of data. Rather, most of the time, the data flow keeps on changing with time. There are times when high number of transactions take place whereas in some time interval, minimal transactions are happening. Due to this fact, a baseline trend needs to be created, which takes time into consideration and current time data is compared with similar time baseline data. So, one baseline trend can be based on the different time of the day, different day of the week, or any special (event) day. The other baseline trend could be based upon the load on the system, as most of the system and application metrics behave differently at different load value. We provide support for both types of trends. Baseline Creation: In case of behavior alert, alerts are generated based on trends and trends are defined in baseline. So, first need to configure baseline.

Type of Trends supported for Baseline in Behavior Alert:

No Trend: It is the most basic type of time-based baseline. In this case, an average of configured period is taken as baseline, and hence actually no consideration of trend. For example, you want to have a baseline for last 30 days irrespective of any time of day or any day of week. In case, we need to find out No Trend or Single Threshold for last 14 days per tier/per metric. So we have to calculate average per tier/per metric, on basis of No Trend for last 14 days. For example – PVS/Number of nodes, we need to calculate average of averages of last 14 days. If sample interval is 2 min then we receive 1 sample in every 2 minutes. so we have 30 samples in every 1 hour. In 1 day, we have 24*30=720 samples. For 14 days, we have 14 buckets of 720 samples as per one day calculated samples. Now, we need to find out overall average for last 14 days on basis of No Trend. In above screens, select No Trend from Trends drop-down list. Define name of baseline in Name text. There are two types of time period. On selecting Moving Time Window, user needs to fill number of last days. That number of last days is counted from current date. On selecting Specific Time Range, user needs to fill from and to date with time. Once user applies the setting of baseline, then corresponding entry is displayed in Available Baseline table. Daily Trend: It is a specific type of trend-based baseline where baseline data is computed as an hourly average of the days specified. For example, if user wants to compare the current hour data (let’s say it is currently 4:30 AM) with average value of all 4:00 AM – 5:00 AM hours for last seven days. In case, we need to find out Daily Trend or 24 Thresholds per day for last 30 days per tier/per metric. So, we have to calculate average per tier/per metric, on basis of Daily Trend for last 30 days. For example – For Web store PVS/Number of nodes, we need to calculate average of averages of last 30 days for each individual hour. We receive 1 sample in every 2 minutes, so we have 30 samples in every 1 hour. In 30 days, we have 30*30=900 samples (approximately) for each individual hour. For 1 day, we have 24 buckets of 900 samples as per each individual hour which keep in two dimensional buckets (Days*Hours). Now, we need to find out overall average for last 30 days on basis of Daily Trend. In above screens, select Daily Trends from Trends drop-down list. Define name of baseline in Name text box. There are two types of time period. On selecting Moving Time Window, user needs to fill number of last days. That number of last days is counted from current date. On selecting Specific Time Range, user needs to fill from and to date with time. Once user applies the setting of baseline, then a corresponding entry is displayed in Available Baseline table. Weekly Trend with Annual override: Using this option, the baseline will be created based on the average hourly value as well as day of the week will also be considered. For example, if user wants to compare the current hour data (let’s say it is currently 4:30 AM of Sunday) with average value of all 4:00 AM to 5:00 AM hours of all Sundays for last 30 days. This trend has an additional option to override a particular week day with any special day. In case, we need to find out Weekly Trend or 168 Thresholds per week for last 90 days per tier/per metric. So, we have to calculate average per tier/per metric, on basis of Weekly Trend for last 90 days. For example – For Web store PVS/Number of nodes, we need to calculate average of averages of last 90 days for each individual hour for each day of week. We receive 1 sample in every 2 minutes so we have 30 samples in every 1 hour. In 90 days, we have 90*30=2700 samples (approximately) for each individual hour. For 1 day, we have 24 buckets of 2700 samples as per each individual hour which is kept in two dimensional buckets (Days*Hours).  Also, we need to calculate an overall sum/avg. in 7 Days*24 hour two dimensional buckets for each of day in week. Now, we need to find out overall average for last 90 days on basis of Weekly Trend. User can choose special day in following way

  • Day of a month (1-28)
  • Last day of every month
  • x`Weekday of a month (1 or 2 or 3or 4 or last) (Monday or Tuesday or Wednesday or Thursday or Friday or Saturday or Sunday) of month
  • Day of a year (1 or 2 or 3 or…30 or 31) of month (January, February, March…December)
  • Weekday of a year (1 or 2 or 3 or 4 or last) (Monday or Tuesday or Wednesday or Thursday or Friday or Saturday or Sunday) of (January, February, March…December)
  • Event Day

Gold Baseline: Use Single Day as baseline – 24 Thresholds for each hour. But, each day is tagged with what golden baseline to use for specific day. If none is specified, default one is used. Or default may also be Weekly Trend. Create baseline buckets and no auto update of baseline. Below screen is displayed for Gold Baseline Day. Load Based Index Baseline: This is a highly advance type of baseline that is unique to Cavisson NetDiagnostics alert framework and it works based upon the load on the system instead of time-based trend. In this baseline, we learn the system behavior on all the loads and then utilize this learning to compare the current data at current load with the baseline value at current load. If we closely monitor an application behavior, then we can see that the Response Time of an application is proportional to the load (Page views per second) on the system. For example, if the load (Page views per second) is 10, then let us assume that the Response time is 100 ms. If load goes to 20, the Response time becomes high, let’s say it goes to 200ms and so on. Therefore, if on one particular time, the value of Response time goes to 500ms on load of 20 then ideally it should generate an alert. Load indexed based baseline learns the system behavior for all loads

  • All System parameters – System/Network/Application parameters learned for different various Page View Per Second Load
  • Average values over last 90 days

In above screens, select Load Index Based Baseline from Trends drop-down list. There are two type of time period – Moving Time Window and Specified Time Range. On selecting Moving Time Window, user needs to fill number of last days. That number of last days is counted from current date. On selecting Specified Time Range, user needs to fill From and To date with time. User needs to click the Add button to specify the load settings. The Add Load Setting window is displayed. Fill the following details:

  • Tier Name
  • Minimum value, Maximum value, and bucket size.
  • Specify the derived graph expression
  • Click Add

Once user applies the setting of baseline, then a corresponding entry is displayed in Available Baseline table. 1

Alert History

Alert history is useful in obtaining insights into past-generated alerts. Alert history is also used to understand how severity of specific alerts may have changed over a period of time. To open the Alert History page, click the Alert History menu item on the Alerts menu. This displays the Alert History window. This window consists of two panes – left pane and right pane. On the left pane, filters are displayed, and on the right pane, alert details are displayed.

Alert Details

On the right pane, following alert details are displayed – alert type, alert severity, rule type, status, rule name, indices, alert value, time when alert is generated, condition expression (on mouse hover it displays the complete condition), and message displayed on alert generation. To view detailed information, double-click the alert. A section for detailed alert information is displayed at the bottom of the window: Disable this detailed section by double-click the alert again.

Show Graph

To view the graph of a particular alert, select the alert and click the Show Graph button. User can select multiple alerts and get their corresponding graphs displayed on the widget panel.

Row Grouping

This feature enables a user to group alerts based on certain parameter. For this, user needs to click the Apply Row Grouping icon  on the top-right corner of the window.  Post that, user needs to select the grouping parameter from the drop-down list (for example, Rule Name). The alerts are grouped and get displayed on the alert history window. To remove the row grouping, click the  icon again.

Advance Alert Filters

Time Filter: This section is used for time filter, such as last 10 minutes, 30 minutes, and so on. Alert Severity: This section is used to filter alerts based on severity, such as new alert, continues alert, upgraded alerts, and downgraded alerts. Alert Type: This section is used to filter alerts based on alert type such as capacity, behavior, and all. Alert Rules: This section is used to filter alerts based on alert rules. Alert rules are displayed in a list. User can search for an alert rule using the search icon. String Filter: This filter is used to filter alerts based a matching rule name, baseline name, or message. Topology Filter: This filter is based on topology (tier, server, and instance). Other: This filter is based on other parameters, such as rule changes, baseline changes, alert setting changes, maintenance window changes, and tomcat changes.

Actions on Alert Filters

  • Apply: This applies the selected filter and displays the result accordingly.
  • Reset: This resets the filters and selects only the default filter values.
  • Clear All: This clears all the specified values for the filters.
  • Select All: This selects all the filters.

Global Filter

User can apply global filters (in the form of a string) on the resultant alerts displayed. This is not specific to any column. The filter is applied on all columns of the displayed result.

Hide Filters

To hide all filters from the left pane, click the Hide Filter icon . To get it back, click the same icon again.

Records per page

By default, 20 records per page is displayed. User can change it to 50, 100, or 200.

Columns level Filter

User can also apply column level filter on the generated alerts. To do this, click the Show Filter icon  on the top-right corner of the window. The column level filters get enabled and user can use those filters for filtering the alerts.

Delete Records

To delete a record(s), select the record(s) and click the Delete Record(s) icon  on the top-right corner of the window. The record gets deleted from the alert history.

Download Alert History

User can download the alert history in word, excel, and PDF format by clicking the icons provided at the bottom of the window.

Indices Column in Alert History

On clicking, the value of Indices in the Indices column is displayed in a pop-up with ‘Alert Value’ column.

 

Active Alert

Active alert window is similar to Alert History window. This window displays all the active alerts generated. The details displayed here are – alert ID, alert severity, rule type, rule name, alert message, alert time, time ago the alert was generated, indices, alert value, and condition expression. Whenever a new alert is generated, Alert ID is displayed in the Alert History table and Active Alert UI.

Here also, user can perform certain actions:

  • From the Alert Type drop-down list, user can specify for which alert type (capacity) the details is to be displayed.
  • User can also view the alerts based on the severity, such as critical, major, or minor.
  • User can search an alert from the Global Filter box.
  • User can view the graphs of individual alert or for all alerts by selecting them and clicking the Show Graph button.

The time-frame in ‘Show Graph’ and ‘Graph in Alert Email’ are shown highlighted.

Graph

Email

  • User can also navigate to alert history by clicking the Alert History
  • To clear alert(s) from the records, select the row and click the Force Clear

The ‘Alert Value’ field displays ‘Alert Value’, ‘Baseline Value’ or ‘Previous Window Value’ and ‘Load Metric Value’ if present for a generated alert.

Note: Alert feature is also implemented for different test runs for Multi DC. The user can view active alert bell counter and the corresponding records in Active alerts.

Satisfied Indices details are displayed in Active Alert/Alert History/Alert Mail and in Alert History table. As per the previous design, if any alert was configured with multiple conditions, there was a chance that alert got generated at certain level of hierarchy which was common in both the conditions. Due to that, the user was unable to get the exact indices for which alert was generated. Now, complete details are provided for all the satisfied indices along with the partial ones.

For Example:

Active Alert UI

Alert History UI

Active Alert Graphs

This section displays graphs of active alerts. To view Active alert graphs, follow the below mentioned steps:

  1. Go to Alerts menu on the left pane and click the Active Alert Graphs menu item. The Active alert graphs are displayed.
  2. User can perform all the operations with alert graphs, such as widget settings, show graph data, add to custom metrics, show graph in tree, view reports, run command, download.

Alert Stats Report

User can view the stats report of the graph data as generated in the active alert graphs. To view the stats report, follow the below mentioned steps:

Go to Alerts menu on the left pane and click the Alert Stats Report menu item. The Alert Stats Report is displayed.

In this report, graph details of alert graphs are displayed, such as graph name, minimum value, maximum value, average value, standard deviation, last value, and number of samples.

On clicking a graph link, the graph is displayed.

User can download a graph in PDF, word, and excel format.

Alert Action History

It contains details of action taken by alerts, such as SNMP Trap Sent / Email Sent / Cisco Spark Chat sent and so on.

To access the Alert Action History section, go to Alert > Alert Action History. This displays the Alert Action History window.

This contains following details:

  • Rule Name
  • Policy Name
  • Action Type
  • Action Time
  • Indices Name
  • Message
  • Description

In SNMPTrap, IDCName field holds the following:

  1. IDCName_SourceIP (Appliance IP) if IDCName already configured.
  2. Source_IP if IDCName is not configured.

System Status

In this tab, the status of all the tiers and their related servers is displayed. It also displays how many Instances are connected with each server. The status of tiers/server/instances are categorized into critical, major, and normal state, which is represented by red, orange, and green color respectively.