Troubleshoot for NetForest

UI is not opening or stuck in login page of NFUI

Possible Reasons #1 Netforest.yml is not correct
Steps to Diagnose & Command Used Check keywords used in Netforest.yml like
server.port:,server.host:,nfdb.url:,nfdb.envs:,NetForest.schedule_alert:
Example:server.port: 8000server.host: “10.10.30.123”nfdb.url: “http://10.10.30.123:9201″nfdb.envs: {“prod”:”http://10.10.30.123:9201″}NetForest.schedule_alert: “10.10.30.123”
Solution Check and make sure these keywords are configured correctly

Possible Reasons #2 Port used in NFUI is not free
Steps to Diagnose & Command Used check keywords used in Netforest.yml like
server.port:
example:server.port: 8000use command :netstat -natp | grep 8000it must be free
Solution Port used for opening of NFUI must be free

Possible Reasons #3 When nfui and nfdb port mismatch
Steps to Diagnose & Command Used Check keywords used in nfdb.yml like
server.port:,server.host:
example:network.host: 10.10.30.123http.port: 9201
Solution Check and configure keywords correctly and also match the nfdb url and port in both nfdb.yml and NetForest.yml

Possible Reasons #4 Cluster may not be in stable state (if there is more than one db in cluster state)
Steps to Diagnose & Command Used Check keywords used in nfdb.yml like
transport.tcp.port: ,discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]Example:transport.tcp.port: 7895discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]
Solution Sometimes NFUI may not open due to cluster configuration, so configure these keywords present in nfdb.yml correctly

Possible Reasons #5 Elastic search Request time out issue
Steps to Diagnose & Command Used 1. Issue should come in GUI while searching

2. Check keyword related to Request time out in Netforest.yml, Keyword for this elastic search.requestTimeout

Example

elasticsearch.requestTimeout: 300000

Solution Increase Timeout value for Elastic search Request time

For integration purpose of NetForest with ND

Possible Reasons #1 Check this keyword NetForest.integrate_nd: present in NetForest.yml may be it is not filled or not filled correctly
Steps to Diagnose & Command Used Fill keyword NetForest.integrate_nd: with ND machine IP port
Example:NetForest.integrate_nd: {“host”:”10.10.50.17″,”port”:”80″, “protocol”:”http”}
Solution Fill this keyword with ND ip and Port which you want to integrate

NFBD Configuration Issue

Possible Reasons #1 nfdb.yml is not correct
Steps to Diagnose & Command Used Check keyword used in nfdb.yml like
network.host: ,http.port: ,path.dataeg:
network.host: 72.52.96.138http.port: 7894path.data: /pgdata/netforest/NFDB
Solution Make sure these keywords should be filled correctly

Possible Reasons #2 Due to low vm.max_map_count (in case of shell)
Steps to Diagnose & Command Used Check vm.max_map_count and increase it by using command

sudo sysctl -w vm.max_map_count=262144

Solution sudo sysctl -w vm.max_map_count=262144

Possible Reasons #3 Due to low space for java runtime process or garbage collector
Steps to Diagnose & Command Used Check jvm.options present in config file of nfdb.yml
Solution Increase two JVM options present in config file of NFDB

-Xms1g(max we can give 30 gb)

-Xmx1g

Possible Reasons #4 Port used for nfdb is not free or nfdb not working
Steps to Diagnose & Command Used Check nfdb.log file present in nfdb (build) directory or
use command journalctl -u nfdb -f this show the error like port is already in use
Solution Use free port in nfdb.yml and restart NFDB

Possible Reasons #5 When data is not coming from specific path
Steps to Diagnose & Command Used Check path.data: /path/to/data keyword
Solution

Possible Reasons #6 Log query when circuit breaker trips
Steps to Diagnose & Command Used

Check the configuration in mapping.json

“breakers”: {

“request”: {

“limit_size_in_bytes”: 10187558092,

“limit_size”: “9.4gb”,

“estimated_size_in_bytes”: 11343200256,

“estimated_size”: “10.5gb”,

“overhead”: 1.0,

“tripped”: 250},

Solution Make sure configuration is correct.

NFDB status is red

Possible Reasons #1 NFDB is out of cluster
Steps to Diagnose & Command Used Check cluster health:

Command:

-curl -XGET localhost:9200/_cluster/health?pretty

Solution In this case, bring all the nodes in a cluster and then open UI until cluster health reached to 100%. Command as below.

curl -XGET localhost:9200/_cat/health?pretty

Shards are unassigned. Wait till the shard got allocated.


Error in cluster or master not discovered

Possible Reasons #1 nfdb.yml is not correct
Steps to Diagnose & Command Used
Check keyword used in nfbd.yml like
network.host: ,http.port: ,path.data:,add keyword transport.tcp.port: in nfdb.yml and use tcp port in all nfdb.yml in all nfdb used incluster
example:transport.tcp.port: 7895Check keyword discovery.zen.ping.unicast.hosts: present in nfdb.yml and insert host
and tcp port in all nfdb.ymlExample:discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]

Solution Make sure to configure these keywords correctly and also all nfdb used in cluster are up.

Possible Reasons #2 Master not discovered
Steps to Diagnose & Command Used Use command curl nfdbhost:port/_cluster/health?pretty

This show error master not discovered then add keyword node.master: true in nfdb.yml of that nfdb which you want to be as a master

Example:
node.master: true

Solution Make sure to configure these keywords correctly and also all nfdb used in cluster are up.

Cluster health not 100% or shards remain unassigned state and data may stuck

Possible Reasons #1
One or more nodes are out of cluster
Steps to Diagnose We have to do rolling restart of nodes and find out the node which is out of cluster by command

curl nfdbip:port/_cat/nodes?v

Commands to validate

Then, do rolling restart

sudo su
Step 1:- curl -X PUT “10.206.96.82:7894/_cluster/settings” -H ‘Content-Type: application/json’ -d'{“transient”: {“cluster.routing.allocation.enable”: “none”}}’?master_timeout=3000s

Step 2:- curl -X POST “10.206.96.82:7894/_flush/synced”

Step 3:- Restart the NFDB Service. (service nfdb restart)

Step 4:- curl -X PUT “”10.206.96.82:7894/_cluster/settings” -H ‘Content-Type: application/json’ -d'{“transient”: {“cluster.routing.allocation.enable”: “all”}}’?master_timeout=3000s
restart nfdb

Possible Reasons #1 Node Reaches a high JVM value
Steps to Diagnose

If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches. It will hurt performance, but it can save you from OOM (Out Of Memory).

Use command-

curl -XPOST ‘http://localhost:9200/_cache/clear’

Commands to validate If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches

Use command-

curl -XPOST ‘http://localhost:9200/_cache/clear’

Enable Nfagent flag

Possible Reasons #1 nfagent not working
Steps to Diagnose & Command Used In cmon.env enable flag as 1 below is the key

CAV_MON_AGENT_OPTS=”-F 1″

Solution 1 means enable 0 means disable
NFagent must work with cmon so make it as 1 (enable)


Grok parse error

Possible Reasons #1 Filter file is not correct
Steps to Diagnose & Command Used There may be any indentation error in filter file or may be some human error
Solution Check and make sure Input.conf, filter.file.conf, and output.conf file is correct


Grok parse failure

Possible Reasons #1 grok matcher in filter is not correct or all logs may not get parsed
Steps to Diagnose & Command Used Correct your grok in filter file or use correct parser for parsing different logs
Solution Check and make sure correct parser is used in filter file and there should not be any human error.


NF agent not able to send data from server

Possible Reasons #1 Incorrect config file
Steps to Diagnose & Command Used Check the config files present in path /home/cavisson/monitors/nf/nfagent/config/conf.d.files

1-input.conf

2-filter.file.conf

3-output.conf

Solution Check and make sure these files are correct in respect of all the keywords and in parser is also correct in filter file.conf and output.conf must have NFDB host and port.

Possible Reasons #2 Incorrect nf.env file and cmon.env
Steps to Diagnose & Command Used

Check nf.env file present in /home/cavisson/monitors/sys and check below keyword

OUTPUT_HOST1,OUTPUT_PORT,INDEX_PREFIX

Example:

SERVER=10.206.96.52

DC=Stress

ENV=Stress

INDEX_PREFIX=cavisson

# If it’s false only OUTPUT_HOST1 and OUTPUT_PORT1 is considered

#OUTPUT_MULTINODE=false

OUTPUT_HOST1=10.206.96.98

OUTPUT_PORT1=7894

AppName=work-NF

YML_CONFIG_NFDB=false

In cmon.env

check CAV_MON_AGENT_OPTS=”-F 1″

this keyword and Tier filed

Solution

OUTPUT_HOST1— this keyword must be filled with the NFDB host in which you want to dump data

OUTPUT_PORT— this keyword must be filled with the correct port of NFDB

INDEX_PREFIX— this keyword must be filled with index name which you want to see in
NetForestUI.

in cmon.env CAV_MON_AGENT_OPTS=”-F 1″

it should be 1 for enabling of nfagent and also define Tier value in it.


::Errors::NotFound: [404] {“error”:{“root_cause”:[{“type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”}],”type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”},”status”:404}

Possible Reasons #1 This error comes in Agent side in nfagentplain.log is due to the keyword YML_CONFIG_NFDB= prsent in nf.env is not configured correctly
Steps to Diagnose & Command Used YML_CONFIG_NFDB= This keyword is filled with two values either true or false
Solution
If it is true we have to dump the config.yml in NFDB
if it is false by default it will go through log path present in Input.conf filepath => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*”this is the log path where nf agent collects the logs for parsing.


Unable to reach host

Possible Reasons #1 This error comes when NFDB is down
Steps to Diagnose & Command Used We have to verify NFDB is running or not .if not then check the required configuration of NFDB
Solution We have to verify NFDB is running or not it must be UP or in running state and check connectivity between NFDB and server


Data index not created

Possible Reasons #1

Either path present in Input.conf file doses not have read permission to user or path is not correct

path => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*”

Steps to Diagnose & Command Used We have to check the log path used in Input.conf file and user must be same and have read access
Solution We have to check the log path used in Input.conf file and user must be same and have read access