NetHavoc Overview

The overall performance of a service is directly linked, among other things, to its ability to tolerate failures. This aspect of an application/software can be tested by deliberately injecting random faults and failures into the application infrastructure.
NetHavoc is a powerful feature added to NetStorm, which allows users to test the resilience of the applications. NetHavoc can be used to inject various faults into the application infrastructure during a load test. The after effects of the fault injection can be monitored through NetStorm’s powerful monitoring capabilities.

Types of Faults

Various faults can be developed in an application during its run. These include:

  • Instance(s) or server(s) down
  • Process termination
  • Server ill health (high CPU, memory, disk space, disk failures etc.)
  • Network outage or slowness

Key Features

  • Faults can be injected randomly in the production and/or the staging environment (during a load test or even in production) and the after effects monitored using the NDE infrastructure (Failure as a Service).
  • Faults can be injected by the Fault Injection software based on different parameters including:
    • Time (off peak hours)
    • Probability (of the fault occurring)
    • Spacing (between two faults)
    • Severity (instance(s), server(s), Tier(s), DC going down)
    • Partial fault (disable network interface) to full fault (server power down)
    • Faults can be injected in different services:
      • Application servers
      • Load balancers

Prerequisites

To inject a fault, the following prerequisites must be met:

  • The server on which the fault has to be applied must have ‘Ubuntu’ operating system.
  • For SSH connection type, the remote user must be a host user.

Note: The host user name and password is used once for creating password-less SSH communication. This is not saved anywhere in code. The SSH key is copied to the host machine.

  • For CMon type connection, CMon should run on both host and remote machines.
  • Packages required:
    • ‘at’ package: The ‘at’ command schedules a command to be run once at a particular time that you normally have permission to run. The ‘at’ command can be anything from a simple reminder message, to a complex script. The NetHavoc machine must have ‘at’ package to schedule the fault.
    • ‘expect’ package: The NetHavoc machine must have ‘expect’ package to connect remote machine via SSH connection type.
    • ‘tc’ package: ‘tc’ command is used to configure Traffic Control in the Linux kernel. The remote machine must have the ‘tc’ package installed for Network related faults.
    • ‘fallocate’ package: ‘fallocate’ command is used to manipulate the allocated disk space for a file. The remote machine must have the ‘fallocate’ package installed for Fill Up Disk faults.
    • ‘az’ package: For Azure Server Termination, Azure (az) package should be installed at the NetHavoc machine.
  • For Network related faults:
    • For SSH connection type, ‘tc’ command should be password-less for given SSH user.
    • For CMon connection type, ‘tc’ command should be password-less for ‘cavisson’ user.
  • For Fill Up Disk fault:
    • For SSH connection type, the partition selected by the given SSH user should have write permission to fill up disk.
    • For CMon connection type, the partition selected by the ‘cavisson’ user should have write permission to fill up disk.
  • For Application Termination fault:
    • For SSH connection type, the application/process must be run by SSH user or SSH group user.
    • For CMon connection type, the application/process must be run by ‘cavisson’ user or ‘cavisson’ group user.
Once a fault is cleared, all the remote programs/scripts and related files made by the script to inject the havoc are removed.

Injecting Faults

A user can inject fault(s) in any running instance/server at specified time or random time. NetHavoc simulates failure at a random point in time interval. User needs to provide specific inputs according to faults. System calls some APIs accordingly, which inject the specified fault into the running instance/server.
Follow the below mentioned steps for injecting faults and analyzing the application behavior:

  1. On the product UI home page, go to Admin on the left panel and click NetHavoc.

2. This displays the NetHavoc window. The window displays the Reports section by default.

The following sections describe each menu item on the NetHavoc window in details.

Reports

On the NetHavoc window, click Reports menu item on the left pane. This section displays the summary of the havocs. A user gets the information about the havoc usages, havoc types, and the status of all the havocs.

Havoc Usages

This information is in the form of a bar graph, where a user can get the status of the havocs.

The different status that are displayed are as follows:

  • Running
  • Completed
  • Ready To Apply
  • Stopped
  • Scheduled
  • Failed

Havoc Types

This information is in the form of a pie chart, where the user can get the details of the types of havocs.

The different havoc types are as follows:

  • Server Termination (Azure)
  • Fill Up Disk
  • Application Termination
  • Network Latency
  • Network Loss
  • Network Corruption
  • High CPU Load

Havoc Summary

This section contains several tabs, namely Overall, Ready To Apply, Scheduled, Running, Completed, Stopped, Failed that display the status of the havocs.

In each tab, a user gets detailed information about the injected havocs, such as action, category, type, time mode, start time, duration, end date/time, connection, tier, server, user, havoc details, status, output, and graphs.

While creating a havoc, when a user selects Dynamic as the Server Selection Mode and applies it, the havoc is displayed as a parent node instead of a general node in the Havoc Summary. Click the  icon to view the child nodes.

The user can apply, delete, or update a havoc from the Havoc Summary itself if the havoc is at Ready To Apply status.

The user can also stop a havoc from the Havoc Summary itself but only when the havoc is at Running status.

Download Reports

A user can download reports in Word format, in Excel format, and in PDF on the lower-left corner of the window.

Create Havoc

On the NetHavoc window, click Create Havoc menu item on the left pane. In this section a user can configure and apply havocs that need to be injected.

The havocs to be configured are divided into three categories. Each category has several types. The user needs to choose the category and its type and proceed to configure the havoc. These categories, their types and other functionalities are described below.

If any havoc is already at Running or Scheduled status, another havoc with Injection Time as Specified or Random cannot be configured. It can be configured if the Injection Time is Current.

Category and Havoc Types

The havocs are divided into three categories. These are – Resource, State, and Network.

Resource

The havocs that impact cores, workers, and memory are placed in this category. The havoc types for this category are as follows:

  1. High CPU Load: In this Havoc type, NetHavoc consumes CPU resources.
    • Fill Core: The number of cores in the CPU a user wants to consume.
    • CPU %: Burns CPU on mentioned server(s). Default value is 10% and maximum can be 80%.
  2. Fill Up Disk: The user needs to specify the partition name of the server. The default partition value is 10% and the default auto correct time is 2 seconds.

State

The havocs that are process killer, shutdown, and time travel are placed in this category. The havoc types for this category are as follows:

  1. Application Termination: NetHavoc can terminate any application on the server(s). Termination can be done using Process ID / Name.
    • Process ID: The ID of the process, which has to be killed by the fault.
    • Process Name: Name of the process needs to be killed by the user.
  2. Server Termination: In this Havoc type, user can terminate a thread on Azure server. User needs to provide the following details:
    • Cloud Type: Cloud type as Azure.
    • Computer Name: Cloud Server name / IP.
    • Resource Group Name: They are logical containers for a collection of resources that can be treated as one logical instance. User can use resource groups to control all of their members collectively.
    • User Name / Password: These are Server credentials.

Network

The havocs that introduce latency and packet loss are placed in this category. The havoc types for this category are as follows:

  1. Network Corruption: User can corrupt network incoming packets from the server(s) and outgoing packets to the server(s). Default value is 10% and maximum can be 50% corruption.
    • Network Interface: It is the point of interconnection between a computer and a private or public network. The drop-down suggests all the interfaces that has been added on the server.
    • Corruption Percentage: The % of network packets needs to be corrupted or manipulated.
  2. Network Loss: Like Network Corruption and Latency, user can apply loss in the incoming and outgoing packets to the server(s). Default value is 10% and maximum can be 100% loss.
    • Network Interface: Provide the network interface.
    • Corruption: The % of network packets need to be corrupted or manipulated.
    • Loss %: The % of packets lost in the network.
  3. Network Latency: Latency is the amount of time a message takes to traverse a system. In a computer network, it is an expression of how much time it takes for a packet of data to get from one designated point to another. As corruption, user can generate latency in the packet receiving and sending to the server(s). User can provide a fix value or range can also be provided.
    1. Network Interface: Provide the network interface.
    2. Specified: Provide integer value of the field. The packets will be delayed by the specified amount of time.
    3. Random: The packets will be delayed by any time between the two given values.

Injection Time

This signifies the time at which the faults are to be injected based on the configurations. A user is provided with various options for fault injection time:

  • Current: Here, system captures the current time and injects havocs for the specified duration from current date and time.

  • Specified: Here, user needs to specify the exact date and time and duration at which the configured fault is to be injected.

  • Random: User can specify the start date and time and end date and time along with the duration for injecting faults. The system picks any random time between the specified time and injects faults based on the provided duration.

Overall Duration

This section displays the information about the total duration of the injected havoc. This information is divided into three stages – Ramp Up, Duration, and Ramp Down. This information is also displayed in the form of a graph.

This is disabled when the selected Havoc Type is Application Termination or Server Termination.

Targets

A user needs to specify tier, server selection mode, server, and connection from the corresponding drop-down lists. Other options change depending on the category and the havoc type that the user selects, which is explained earlier.

Once all the configurations are done, click the Configure Havoc button to configure the havoc. The user can view the summary of the configured havoc in the Havoc Summary window.

Click Ok to close this window.

After doing all the configurations, if the user clicks the Configure & Apply Havoc button, the Havoc Summary window is displayed. But clicking Ok applies the havoc and the user is redirected to the Reports window where the various details of the havoc can be viewed in the Havoc Summary section.

The user can also click Reset to change the configurations of the havoc before applying.

Apply Havoc

On the NetHavoc window, click Apply Havoc menu item on the left pane. In this section, a user can apply all the configured havocs that need to be injected for the specified start time and duration.

The user gets detailed information about the configured havocs, such as category, havoc type, time mode, start time, duration, connection, tier, server, user, havoc details, status, and output. All these havocs are at ‘Ready To Apply’ status.

Click the check box against the havoc that needs to be applied, and then click the Apply Havoc button at the bottom of the window. This displays the Havoc Summary window confirming the activation of the havoc.

Update Havoc

On the NetHavoc window, click Update Havoc menu item on the left pane. In this section, a user can edit/update all the havocs applied in the system.

The user gets detailed information about the applied havocs, such as category, havoc type, time mode, start time, duration, connection, tier, server, user, havoc details, status, and output. The havoc can be updated only in the case if status is Ready to Apply or Scheduled. If it reaches to Running status, the havoc cannot be updated.

Click the check box against the havoc that needs to be edited/updated, and then click the Proceed To Update Havoc button at the bottom of the window. This displays a window where the user can change the configurations of that havoc.

After updating the configurations, click the Update Havoc button at the bottom of the window to apply the changes.

Stop Havoc

On the NetHavoc window, click Stop Havoc menu item on the left pane. In this section, a user can stop the injected havoc forcefully when it is in Scheduled status.

The user gets detailed information about the configured havocs, such as category, havoc type, time mode, start time, duration, connection, tier, server, user, havoc details, status, and output.

Click the check box against the havoc that needs to be stopped, and then click the Stop Havoc button at the bottom of the window. This stops the injected havoc.

Delete Havoc

On the NetHavoc window, click Delete Havoc menu item on the left pane. In this section, a user can delete the injected havoc from the system. The havoc can be deleted only in the case if status is Ready to Apply. If the status is Scheduled or Running, the havoc cannot be deleted.

The user gets detailed information about the havocs, such as category, havoc type, time mode, start time, duration, connection, tier, server, user, havoc details, status, and output.

Click the check box against the havoc that needs to be deleted, and then click the Delete Havoc button at the bottom of the window. This displays the Havoc Summary window confirming the deletion of the havoc.

ReApply Havoc

On the NetHavoc window, click ReApply Havoc menu item on the left pane. In this section, a user can reapply all the havocs that are at Completed, Stopped or Failed status. The havocs cannot be reapplied if they are at any other status.

The user gets detailed information about the havocs, such as category, havoc type, time mode, start time, duration, connection, tier, server, user, havoc details, status, and output. All these havocs are at Completed, Stopped or Failed status.

Click the check box against the havoc that needs to be reapplied, and then click the Proceed To Reapply Havoc button at the bottom of the window. This displays a window where the user can change only the Injection Time of that havoc.

After updating the configurations, click the Reapply havoc button at the bottom of the window to apply the changes.

Other Operations

A user can also perform the following operations on the NetHavoc window:

Auto Refresh

A user can use this option to auto refresh the NetHavoc UI. Upon doing this, the changes are reflected across all users using the same machine.

To do this, select the Auto Refresh check box at the top of the window, and then select the desired time interval for auto refresh from the drop-down list. The available options are:

  • 30 seconds
  • 1 minute
  • 2 minutes
  • 5 minutes
  • 10 minutes

Time Period

A user can select the time period for which the data should be displayed on the NetHavoc window. To do this, select the time period from the drop-down list at the upper-right corner of the window. The available options are:

  • Last 10 Minutes
  • Last 30 Minutes
  • Last 1 Hour
  • Last 2 Hours
  • Last 4 Hours
  • Last 6 Hours
  • Last 8 Hours
  • Last 12 Hours
  • Last 24 Hours
  • Custom