NetDiagnostics (ND) Agent Overhead – Test Report

This section provides a detailed analysis of the impact on the server due to the installation of the ND agent (latest Release 4.1.8) in terms of memory and CPU utilization. The analysis is performed on comprehensive systems both at client as well as Cavisson end.

Overview

Application performance monitoring (APM) is done by monitoring application and system level metrics. These metrics include but are not limited to:

  • System CPU Utilization
  • Memory Utilization
  • Network Throughput Used
  • Garbage Collection Stats
  • Method Timings
  • Method Hotspots

To monitor and capture these metrics all Application Performance Management (APM) solutions use agents. With agent-based approach, it is important to ensure that the overhead and footprint used by the APM tool is extremely low.

There are several legacy products where the overhead shoots exponentially with increase in number of monitors and features of APM used, but not with Cavisson NetDiagnostics Enterprise.

While most of the legacy products claim to having an overhead of 2-5% (only for the lite or limited version of APM), only Cavisson NetDiagnostics Enterprise offers minimal overhead, between 0 to 2%, even in its enterprise-ready state.

Cavisson Approach

Cavisson NetDiagnostics Enterprise (NDE) has been engineered to keep minimal CPU and memory overhead.

In case of NDE, there are two agents used:

  • CavMon Agent for system-level and application-level monitoring such as:
    • CPU, Memory utilization,
    • Disk Space Usage, Disk I/O,
    • Garbage Collection Stats
  • NetDiagnostics (ND) Agent for code-level monitoring such as:
    • Method timing
    • Method Hotspots
    • DB Query
    • Business Transaction Monitoring, Flow paths

Cavisson NDE is being used at leading global retailers, banks, and network providers for years without causing any overhead issues in terms of CPU utilization.

Test Details

For the detailed analysis, 3 types of tests have performed:

Sr. No Test Number Scenario
Test 1 TR 9553 Monitoring the server with inactive mode of ND agent.
Test 2 TR 9556 Monitoring the impact of ND Agent on the server without ByteCode Instrumentation. (No profiling done)
Test 3 TR 9557 Monitoring the impact of ND Agent on the server with 10 % ByteCode Instrumentation. (Profiles created)

Note:

  • The tests were conducted in LT2 environment where ND was installed across all JVMs on application servers.

Server Configuration

Model name Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
L3 cache 35840K
CPU MHz 2593.993
CPU(s) 8
Total Memory 77485012 kB
No of instances per server 6
GC settings UseParallelGC

Test Scenario

Load 50 Users
Ramp up 5 Minutes
Duration 30 Minutes
Ramp down 5 Minutes

In all the tests, 50 users were ramped-up into the system in 5 minutes for dotcom channel for 30 minutes’ duration and then followed by ramp-down in 5 minutes.

Comparison Approach

Comparison was made between Test 1, Test 2, and Test 3 to see the resource consumption within the duration phase (i.e. 30 minutes).

Observation

The observations for Test metrics, CPU utilization, heap memory, and total number of GC/mins are stated below.

Test Metrics

The following stats shows test metrics comparison:

Test Results
Test Scenario  TPS ART(msecs) Execute Thread total count
Test 1 (Without ND Agent) 74 695 28
Test 2 (With ND Agent) 79 560 30
Test 3 (ND Agent + Profiling) 80 547 30

 

CPU Utilization

The following stats show comparison of CPU utilization on one of the application server:

Test Results
Test Scenario Average CPU Consumption (%) Overhead (%)
Test 1 (Without ND Agent) 23
Test 2 (With ND Agent) 23 0
Test 3 (ND Agent + Profiling) 24 +1

Note: Similar behavior was observed on other App servers.

Heap Memory

The following stats show comparison of heap memory consumed by one of the JVMs on Dotcom application server.

Test Results
Test Scenario Heap Memory Consumption by the Dotcom JVMs (MB) JVM Heap Used Memory Percent (Pct) Overhead (%)
Test 1 (Without ND Agent) 2657 37
Test 2 (With ND Agent) 2995 40 3
Test 3 (With ND Agent + Profiling) 2960 41 4

Note:

  • Maximum increase ~300 MB to ~350 MB for ND enabled test as compared to Test 1 (without ND).
  • Same behavior was observed on other JVMs also.
Total Number of GC/min

The following stats show comparison of Total Number of GC/min by one of the JVMs on Dotcom application server.

Test Results
Test Scenario Total Number of GC/min on Dotcom JVMs Percentage Change (%)
Test 1 (Without ND Agent) 0.450
Test 2 (With ND Agent) 0.350 -22.0%
Test 3 (With ND Agent + Profiling) 0.400 -11.0%

Summary

As seen in above metrics, the overhead of running ND Agent for code level monitoring is minimal:

  • No major impact on CPU Utilization.
  • Around 3 to 4% i.e. (300 to 350 MB) change in heap memory utilization was observed when tested with NDAgent and NDAgent + Profiling respectively.
  • Decrease in Total number of GC per minute was observed with ND enabled tests, which is insignificant in terms of impact.