Dashboard Troubleshoot
Login UI is not opening |
![]() |
Possible Reasons #1 | Tomcat is not running |
Steps to Diagnose | To Start tomcat, need to run given command from anywhere or from the working directory.
(i) /etc/init.d/tomcat start (For work Controller) (ii) /etc/init.d/tomcat_ControllerName (For Another Controller) |
Commands to validate | To check the tomcat process :ps -ef|grep tomcat
To start the tomcat: (i) /etc/init.d/tomcat start (For work Controller) (ii) /etc/init.d/tomcat_ControllerName (For Another Controller) |
Possible Reasons #2 | Tomcat is running but got following exception in catalina.out Bind Exception Caused by: java.net.BindException: Address already in use (Bind failed) at java.net.PlainSocketImpl.socketBind(Native Method) |
Steps to Diagnose |
To solve this error, we can either kill the service taking that port or can change our web server to run on another port.
Discovering the Conflict On Windows, the last column of output will give us the process id of the service currently running on 8080: netstat -natp | find “8080” Output: Output: TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING 21376 Output: 21376 The server.xml file looks like this:
> Killing the Running Service To stop the running process, we can use the kill command. On Windows environment: taskkill /F /PID 21376 On Unix/Linux environment: Mac OS X environment: kill -9 21376 |
Commands to validate | Path of catalina.out i.e.,
$NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out Path of server.xml $NS_WDIR/apps/apache_tomcat-7.0.91/conf Path of web.xml $NS_WDIR/apps/apache_tomcat-7.0.91/conf |
Possible Reasons #3 | Tomcat is running but got following exception in catalina.out SEVERE: Socket accept failed java.net.SocketException: Too many open files (Accept failed) at java.net.PlainSocketImpl.socketAccept(Native Method) |
Steps to Diagnose | Usually what we do is to set the ulimit to a greater value (e.g. 1024 by default). But in order to make it permanent after reboot the first thing suggested is to update the /proc/sys/fs/file-max file and increase the value, then edit the /etc/security/limits.conf and add the following line * – nofile 2048 |
Commands to validate | Path of catalina.out i.e.,
$NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out |
Possible Reasons #4 | Tomcat is running but got following exception in catalina.out
java.lang.OutOfMemoryError: Java heap space |
Steps to Diagnose |
> Check the memory stats
> Check the java heap stats 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options “-Xmx512M”, this will immediately solve your OutOfMemoryError. Here is an example of an increasing the maximum heap size of JVM. Also, it’s better to keep -Xmx to -Xms ration either 1:1 or 1:1.5 if you are setting heap size in your java application. 2) The second way to resolve OutOfMemoryError in Java is difficult and comes when you don’t have much memory and even after an increase in maximum heap size you are still getting java.lang.OutOfMemoryError. In this case, you probably want to profile your application and look for any memory leak using heap dump. |
Commands to validate | Command for checking the free memory free -lg command for checking the heap stats jmap -heap Path of catalina.out i.e., $NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out Path of site.env ie., $NS_WDIR/webapps/sys command for heap dump: jmap -dump:file=heap.hprof,format=b 18293 |
Possible Reasons #5 | Tomcat is running but got following exception in catalina.out
Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: CAV-KOL-GTW-ND-004: CAV-KOL-GTW-ND-004: Name or service not known |
Steps to Diagnose | It may be possible that hostname and hosts are mismatched.
*hostname and hosts are present in /etc. |
Commands to validate | Path of catalina.out i.e.,
$NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out |
Possible Reasons #6 | Tomcat is running but space is getting full |
Steps to Diagnose | Check the space using df -h and create the space accordingly. |
Possible Reasons #7 | Tomcat is running but permission and ownership of log files are not correct |
Steps to Diagnose | All the tomcat logs should have ownership of Cavisson. |
Commands to validate | Path of catalina.out i.e.,
$NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out |
Possible Reasons #8 | When the tomcat process is killing repeatedly |
Steps to Diagnose | We need to first check the tomcat logs, then identify the problem.
1. Memory issue: Check the free memory and configure the site.env accordingly. 2. Issue is with other Process apart from tomcat: Need to check kernal logs using dmesg -T. 3. Core of jdk: Check the core back trace. |
Commands to validate | “”Path of catalina.out i.e., $NS_WDIR/apps/apache_tomcat-7.0.91/logs/catalina.out “” Path of site.env $NS_DIR/webapps/sys |
Possible Reasons #9 | Tomcat is taking too long to start because of the open jdk version used instead of oracle jdk |
Steps to Diagnose | Install the oracle jdk version 8 b 241 and set the path for /etc/environment and netstorm.env |
Commands to validate |
Possible Reasons #10 | Tomcat is running, but tomcat-security.jar missing
Could not load java.lang.Integer. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact. java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1813) |
Steps to Diagnose | Need to add tomcat-security.jar in library of tomcat. If it is not available, then copy from below path:
/work/webapps/ROOT/WEB-INF/lib$ ls -ltra tomcat-security.jar -rw-rw-r– 1 cavisson cavisson 9808 Feb 21 10:33 tomcat-security.jar Case: This happens when the build is downgraded from 4.2.0 to 4.1.15 or newly upgraded 4.2.0/4.3.0 (few). |
Commands to validate | work/apps/apache-tomcat-7.0.91/lib |
Possible Reasons #11 | Tomcat is running, but garbage argument passed in the last few lines of web.xml
java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1813) |
Steps to Diagnose | When you downgrade build from 4.2.0 to any lower build like 4.1.15 or 4.1.14, please remove/comment the following lines from web.xml and restart the tomcat.
<filter> <filter-name>RequestLoggingFilter</filter-name> <filter-class>org.cavisson.main.auditlogging.servlet.filters.RequestLoggingFilter</filter-class> </filter> <filter-mapping> <filter-name>RequestLoggingFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> |
Commands to validate | work/apps/apache-tomcat-7.0.59/conf$ vi web.xml |
Not able to Login the Product UI |
Possible Reasons #2 | Authentication failure due to Given User is not present in the database |
Steps to Diagnose | Need to check user present in database i.e., access_control or not |
Commands to validate | Step to check user is present or not.
1)Enter in data base psql access_control cavisson; 2)show table name in database \dt ; 3)check user present in data base or not by running Query |
Dashboard UI is not opening |
Possible Reasons #1 | a.Insufficient free memory (0GB)
cavisson@api:~/work$ free -lg
|
|||||||||||||||||||||||||||||||||||
Steps to Diagnose | Need to check the free memory and if cache, buffers are consuming the free memory, then clear by commands. | |||||||||||||||||||||||||||||||||||
Commands to validate | free -lg(check the free memory) , sync; echo 1 > /proc/sys/vm/drop_cache (By root) >> clear the buffer&cache memory |
Dashboard UI is opening blank i.e. no graph tree |
Possible Reasons #1 | In current partition, the testrun.gdf is not present |
Steps to Diagnose | Need to check the testrun.gdf is present or not in current partiion.
Also validate the testrun.gdf by using tool. |
Commands to validate |
“How to check current partition :
/work/logs/TR33333$ cat .curPartition FirstPartitionIdx=20191125160000 CurPartitionIdx=20191125160000 To check gdf files : ls -ltr testrun.gdf* in current partition Path for running command for validation of gdf files : /home/cavisson/NS_WDIR/webapps/DashboardServer/WEB-INF/lib Tool command for validating testrun.gdf files: java -cp netstorm_bean.jar:exp4j-0.4.8.jar:commons-io-2.2.jar:kryo-2.23.0.jar pac1.Bean.ReadRTGMessageFile” |
Not able to expand Graph Tree |
“Taking too much time to open Dashboard UI |
![]() |
Possible Reasons #3 | Lots of threads go into waiting or blocked state
Cases : 1.Due to large packet size 2.Due to frequently gdf changes |
Steps to Diagnose |
Need to take thread dumps of tomcat process and analyse it from threads. Which threads are the culprit (Blocked/locked/waiting max time). If these are cavisson threads, then need to analyse with cavisson team with thread dumps. For clearing the thread queue restat the tomcat after taking logs by using the shell(nsi_collect_logs).
Need to remove unwanted vector with respect to monitoring, somehow it will optimize the packet size. Need to check the testrun.gdf and find out the monitors whish are deleted or added frequently We can check the Group row in which the sixth field is for number of vectors in present in that group in file testrun.gdf Packet size will be in bytes : —– packet size of ‘testrun.gdf’ is – 12280364 —– |
Commands to validate | “jstack -l (pid of tomcat) > filename
search com.cavisson for threads related to GUI in thread dump For checking the packet size: nsu_get_packet_size -t 1234 -p 20170929121212 -v 1(if version is present else no need to give) |
Possible Reasons #4 | High CPU and Memory Used By tomcat |
Steps to Diagnose |
Need to check the CPU utilization graph and Used memory graph then analyse it .
Need to validate Tomcat metrics for that time period and also need to check the other processes like NDEMain thread, NDP, Cmon from process stats monitor through web-dashboard. If Cpu utilization for tomcat is high then we need to capture the thread dump for that particular java process and analyze it. If Memory utilization is High then we need to take heap dump and analyze it for tomcat Process id. |
Commands to validate | $NS_WDIR/sys/site.env
jstack -l (pid of tomcat) > filename jmap -heap (pid of tomcat) for heapdump jmap -dump:file=heap.hprof,format=b (pid of tomcat) for full heapdump |
Possible Reasons #5 | Cache service is not working properly or it is not enabled |
Steps to Diagnose | Need to check that these keywords should be present in config.ini netstorm.execution.dashboardServerPacketCacheLimit=Last_24_hours |
Commands to validate | /home/cavisson/NS_WDIR/webapps/sys |
Possible Reasons #6 | Large number of graphs are plotted in the Dashboard |
Steps to Diagnose | When you are using dashboard please try to open the number of graph as per your requirement.
Check in Network tab in console which request is taking more time or is in pending state. |
Commands to validate | While opening Dashboard , open console by Ctrl+Shift+I (Windows) |
Possible Reasons #7 | Debug flag Enabled with higher level in config.ini & moduleDebugLevel.properties |
Steps to Diagnose | Debug level should increase only when debugging any issues and rest of the time it should be ‘0’. |
Commands to validate | /home/cavisson/$NS_WDIR/webapps/sys |
Possible Reasons #8 | Frequently alert operations like Edit Alert Rule & Maintenance |
Steps to Diagnose | Don’t misuse the alert Maintenance window. Don’t update the alert configuration repeatedly until it was not completely updated. Wait for updation.
For P1 critical tomcat can be restarted so that threads can be released. |
Commands to validate | check if any threads of alert engine are blocked : jstack -l (pid of tomcat) > filename |
Possible Reasons #9 | Time period applied in GUI for longer duration with view by Min (Longer Time Period is saved in favorite and enabled load favorite with saved time frame) |
Steps to Diagnose | Avoid to open the graph for longer with view by min. If you are applying the time period more than 2 days apply, then apply with view by day and less than 24 hr then only view by min. |
Possible Reasons #10 | Frequent Rest service call related to KPI & Alert |
Steps to Diagnose | While doing load-test or production test, try to optimize rest call service count. Somehow it will improve your GUI performance. |
Possible Reasons #11 | High number of users login to Appliance |
Steps to Diagnose | Optimize/Limit the number of Product (NS/ND) user while production or load test somehow it will increase your GUI performance because somehow it will decrease the user’s unnecessary activity. |
Data is not coming in Dashboard Graph |
Possible Reasons #2 | Test is not running![]() |
Steps to Diagnose | Need to check whether test is running or not |
Possible Reasons #3 | Vector is not available for which you want to check the graph for your applied time period |
Steps to Diagnose | Need to check whether the vector is available or not testrun.gdf for that partition |
Commands to validate |
Possible Reasons #5 | Aggregate & Transpose data are not created properly |
Steps to Diagnose |
If Aggregate and Transposed are not created while day change, then need to check the configuration in config.ini for aggregate and transposed keyword and need create it manually. If keywords are not enabled, then enable it and restart the tomcat for updation.
Aggregate keyword in webapps/sys/config.ini >> netstorm.execution.activateAggregateSamples=1 netstorm.execution.aggregateSampleInterval=1 netstorm.execution.aggrPacketWritePermission=1 netstorm.execution.enablePublicDataCenters=0 Need to check the transposed keyword is enabled or not in config.ini. If not then enable it and restart the tomcat for updation. netstorm.execution.transposedata.read_enabled=1 netstorm.execution.transposedata.limit=Last_2_Day netstorm.execution.transposedata.write_permission=1 Validation of aggregate data : Step 1:- Find Size of RTGmessage file.data of that day;- cavisson@controller:~/NC_MON/logs/TR11111/aggr_1h/20181114000000$ ls -ltr -rw-rw-r– 1 cavisson cavisson 4553317 Nov 15 00:00 testrun.gdf -rw-rw-r– 1 cavisson cavisson 4184743 Nov 15 00:00 testrun.gdf.1 -rw-rw-r– 1 cavisson cavisson 4248855 Nov 15 00:01 testrun.gdf.2 -rw-rw-r– 1 cavisson cavisson 4246303 Nov 15 00:02 testrun.gdf.3 -rw-rw-r– 1 cavisson cavisson 105873 Nov 15 00:02 testrun.gdf.4 -rw-rw-r– 1 cavisson cavisson 11677312 Nov 15 00:03 rtgMessage.dat -rw-rw-r– 1 cavisson cavisson 64248048 Nov 15 00:03 rtgMessage.dat.1 -rw-rw-r– 1 cavisson cavisson 97162632 Nov 15 00:03 rtgMessage.dat.2 -rw-rw-r– 1 cavisson cavisson 10789892 Nov 15 00:03 rtgMessage.dat.3 -rw-rw-r– 1 cavisson cavisson 2049432 Nov 15 00:03 rtgMessage.dat.4 Step 2:- Find packet size of all testrun.gdf Testrun.gdf = 11677312 / 11677312 = 1 Testrun.gdf.1 = 64248048 / 10708008 = 6 Testrun.gdf.2 = 97162632 / 10795848 = 9 Testrun.gdf.3 = 10789892 / 10789892 = 1 Testrun.gdf.4 = 2049432 / 292776 = 7 =========================Total = 24(correct) Steps to Validate transpose Data: Read data as same as rtg just use 1 in this step: Enter File Mode. 0/1 (0 – RTG files of partition, 1 – RTG files from a transposed directory of a partition). Note : The last Sample in transpposed is garbage value. |
Commands to validate | home/cavisson/work/webapps/sys#config.ini
/home/cavisson/$NS_WDIR/webapps/DashboardServer/WEB-INF/lib$ java -cp netstorm_bean.jar:exp4j- 0.4.8.jar:commons-io-2.2.jar:kryo-2.23.0.jar pac1.Bean.ReadRTGMessageFile |
Not able to load favorite |
Favorite is showing blank data |
Possible Reasons #1 | Permission issue |
Steps to Diagnose | 1. First we need to check the metrics which are available in Testrun is also present in Favorite. If not then we need to change or create the favorite on the basis of Test run metrics.
2. The given rule metrics should be present in Testrun. |
Commands to validate | Path of Favorite:
$NS_WDIR/webapps/sys/webdashboard/favorite |
Not able to create favorite |
Possible Reasons #1 | 1. Test Run having no monitor graphs of saved favorite
2. Mixed Widget Rules while saving favorite |
Steps to Diagnose | Verify from Favorite Rule and metrics on widget. |
Commands to validate | Path of Favorite:
$NS_WDIR/webapps/sys/webdashboard/favorite |
Not able to Edit/Update favorite |
Possible Reasons #1 |
1. Permission issue:
Find the following error in rtgError.log i.e., stack trace: java.io.FileNotFoundException: /home/cavisson/work/webapps/sys/webdashboard/favorites/PE_PeakMonitoring/CSC_Overall_Green.json (Permission denied) at java.io.FileOutputStream.open0(Native Method) 2. Relative path Find the following exception in rtgError.log i.e., java.io.FileNotFoundException: /home/cavisson/work/webapps/sys/webdashboard/favorites/Sachin/TVS_Prdo_Blue_CNCAPI.json (No such file or directory) at java.io.FileOutputStream.open0(Native Method)Permission issue |
Steps to Diagnose | 1. Need to check the ownership of favorite and also need to check the permission of users. > For edit/update user has read/write or admin capability.
2. The relative path in favorite json should be correct. |
Commands to validate | Path of rtgError.log:
$NS_WDIR/webapps/netstorm/logs Path of Favorite : $NS_WDIR/webapps/sys/webdashboard/favorite |
Not able to apply Time period |
Possible Reasons #1 | Test is not running from the last few hours, then you can’t apply last 10min/last30min/last 1hr though data is not available. ![]() |
Steps to Diagnose |
Taking too much time to apply time period less than 4 hours |
Taking too much time to apply time Period of previous date with View by Sample interval |
Possible Reasons #1 | Applying time period by enabling the discontinuous mode |
Steps to Diagnose | Need to Disable the Discontinuous mode while applying large data for small interrval time. |
Possible Reasons #2 | Transposed not being created (4.1.15 & onwards version) |
Steps to Diagnose |
Need to make the transposed data manually if not made automatically. Need to check if the transposed keyword is enabled or not in config.ini. If not then enable it and restart the tomcat for updation.
netstorm.execution.transposedata.read_enabled=1 netstorm.execution.transposedata.limit=Last_2_Day netstorm.execution.transposedata.write_permission=1 enableAggFlowmap=0 If it is fine then its status will show like this: #Property File of Transposed Partition #Wed Oct 23 07:41:46 CDT 2019 Status=0 Message=Successfully Created. CreatedVia=Transposed |
Commands to validate | “Path of transposed data & status /home/cavisson/work/logs/TR222999# cat 20191022000000/transposed/.status” |
Taking too much time to apply time Period of previous date with View by Aggregation |
Possible Reasons #1 | This completely depends upon aggregate data |
Steps to Diagnose | Need to check aggregate data whether it was made or not while day change, if not then need to update manually by shell. It is better to take help from client support for this. |
Commands to validate |
work/webapps/DashboardServer/WEB-INF/lib$ nohup java -Xms10g -Xmx50g -cp netstorm_bean.jar:commons-io-2.2.jar:kryo-2.23.0.jar:slf4j-api-1.5.8.jar:slf4j-log4j12-1.5.8.jar:log4j-1.2.16.jar:tdigest-0.0.1-screenshot.jar pac1.Bean.dashboard.data.tool.StartRTGDataAggregationTool 0 5050 07/05/2019 08/05/201960000 1 &
|
Taking too much time to open Transaction detail |
Possible Reasons #1 | |
Steps to Diagnose | 1. Need to check the size of pctmessage.dat and aslo need to verify the percentile option from GUI
2. Need to check the total number of transactions from testrun.gdf or transaction |
Commands to validate |
Path of pctmessage.dat i.e.,
$NS_WDIR/logs/TestRun/Partition/pctMessage.dat Path of Testrun.gdf i.e., $NS_WDIR/logs/TestRun/Partition/Testrun.gdf How to check the enabled percentile options in GUI configurations>configuration settings>Dashbaord settings>TransactionDetails |
Transaction failure chart is not opening |
Possible Reasons #1 | 1. Percentile data size is very large and enabled percentile option in Transaction details
2. Large number of transactions used in Test Run |
Steps to Diagnose | 1. Database is not running.
2. Data is not uploaded in database. |
Commands to validate | 1) Check postgres is running or not
ps -ef |grep postgres |
Taking too much time to apply time period in Transaction detail |
Possible Reasons #1 | Transaction Error data not uploaded in Database
Database not running |
Percentile data is not coming in transaction detail |
Possible Reasons #1 | Same as Dashboard applying time frame as |
Steps to Diagnose | 1. Need to check that percentile is enabled or not from Transaction detail configuration.
2. Validate the percentile data from transaction graph in web dashbaord |
Percentile data is coming incorrect in Transaction detail |
Possible Reasons #1 | |
Steps to Diagnose | 1. Validate the percentile data from transaction graph in web dashbaord.
2. Check the custom pct file in scenario. |
Not able to drill down from transaction detail |
Possible Reasons #1 | Data not uploaded in data base |
Steps to Diagnose | 1. Need to verify that data is present in csv or not.
2. If it is present then check data in the database table 3. If it is present in the database table then need to check the query output using wrapper command or drill.down.query |
Possible Reasons #2 | Database not running |
Steps to Diagnose | Check the database is running or not |
Commands to validate | Check postgres is running or not
1) ps -ef |grep postgres 2) /etc/init.d/postgresql status 3) pg_isready |