Troubleshoot for NetForest

UI is not opening or stuck in login page of NFUI

Possible Reasons #1Netforest.yml is not correct
Steps to Diagnose & Command UsedCheck keywords used in Netforest.yml like
server.port:,server.host:,nfdb.url:,nfdb.envs:,NetForest.schedule_alert:
Example:server.port: 8000server.host: “10.10.30.123”nfdb.url: “http://10.10.30.123:9201″nfdb.envs: {“prod”:”http://10.10.30.123:9201″}NetForest.schedule_alert: “10.10.30.123”
SolutionCheck and make sure these keywords are configured correctly

Possible Reasons #2Port used in NFUI is not free
Steps to Diagnose & Command Usedcheck keywords used in Netforest.yml like
server.port:
example:server.port: 8000use command :netstat -natp | grep 8000it must be free
SolutionPort used for opening of NFUI must be free

Possible Reasons #3When nfui and nfdb port mismatch
Steps to Diagnose & Command UsedCheck keywords used in nfdb.yml like
server.port:,server.host:
example:network.host: 10.10.30.123http.port: 9201
SolutionCheck and configure keywords correctly and also match the nfdb url and port in both nfdb.yml and NetForest.yml

Possible Reasons #4Cluster may not be in stable state (if there is more than one db in cluster state)
Steps to Diagnose & Command UsedCheck keywords used in nfdb.yml like
transport.tcp.port: ,discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]Example:transport.tcp.port: 7895discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]
SolutionSometimes NFUI may not open due to cluster configuration, so configure these keywords present in nfdb.yml correctly

Possible Reasons #5Elastic search Request time out issue
Steps to Diagnose & Command Used1. Issue should come in GUI while searching

2. Check keyword related to Request time out in Netforest.yml, Keyword for this elastic search.requestTimeout

Example

elasticsearch.requestTimeout: 300000

SolutionIncrease Timeout value for Elastic search Request time

For integration purpose of NetForest with ND

Possible Reasons #1Check this keyword NetForest.integrate_nd: present in NetForest.yml may be it is not filled or not filled correctly
Steps to Diagnose & Command UsedFill keyword NetForest.integrate_nd: with ND machine IP port
Example:NetForest.integrate_nd: {“host”:”10.10.50.17″,”port”:”80″, “protocol”:”http”}
SolutionFill this keyword with ND ip and Port which you want to integrate

NFBD Configuration Issue

Possible Reasons #1nfdb.yml is not correct
Steps to Diagnose & Command UsedCheck keyword used in nfdb.yml like
network.host: ,http.port: ,path.dataeg:
network.host: 72.52.96.138http.port: 7894path.data: /pgdata/netforest/NFDB
SolutionMake sure these keywords should be filled correctly

Possible Reasons #2Due to low vm.max_map_count (in case of shell)
Steps to Diagnose & Command UsedCheck vm.max_map_count and increase it by using command

sudo sysctl -w vm.max_map_count=262144

Solutionsudo sysctl -w vm.max_map_count=262144

Possible Reasons #3Due to low space for java runtime process or garbage collector
Steps to Diagnose & Command UsedCheck jvm.options present in config file of nfdb.yml
SolutionIncrease two JVM options present in config file of NFDB

-Xms1g(max we can give 30 gb)

-Xmx1g

Possible Reasons #4Port used for nfdb is not free or nfdb not working
Steps to Diagnose & Command UsedCheck nfdb.log file present in nfdb (build) directory or
use command journalctl -u nfdb -f this show the error like port is already in use
SolutionUse free port in nfdb.yml and restart NFDB

Possible Reasons #5When data is not coming from specific path
Steps to Diagnose & Command UsedCheck path.data: /path/to/data keyword
Solution

Possible Reasons #6Log query when circuit breaker trips
Steps to Diagnose & Command Used

Check the configuration in mapping.json

“breakers”: {

“request”: {

“limit_size_in_bytes”: 10187558092,

“limit_size”: “9.4gb”,

“estimated_size_in_bytes”: 11343200256,

“estimated_size”: “10.5gb”,

“overhead”: 1.0,

“tripped”: 250},

SolutionMake sure configuration is correct.

NFDB status is red

Possible Reasons #1NFDB is out of cluster
Steps to Diagnose & Command UsedCheck cluster health:

Command:

-curl -XGET localhost:9200/_cluster/health?pretty

SolutionIn this case, bring all the nodes in a cluster and then open UI until cluster health reached to 100%. Command as below.

curl -XGET localhost:9200/_cat/health?pretty

Shards are unassigned. Wait till the shard got allocated.


Error in cluster or master not discovered

Possible Reasons #1nfdb.yml is not correct
Steps to Diagnose & Command Used
Check keyword used in nfbd.yml like
network.host: ,http.port: ,path.data:,add keyword transport.tcp.port: in nfdb.yml and use tcp port in all nfdb.yml in all nfdb used incluster
example:transport.tcp.port: 7895Check keyword discovery.zen.ping.unicast.hosts: present in nfdb.yml and insert host
and tcp port in all nfdb.ymlExample:discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]
SolutionMake sure to configure these keywords correctly and also all nfdb used in cluster are up.

Possible Reasons #2Master not discovered
Steps to Diagnose & Command UsedUse command curl nfdbhost:port/_cluster/health?pretty

This show error master not discovered then add keyword node.master: true in nfdb.yml of that nfdb which you want to be as a master

Example:
node.master: true

SolutionMake sure to configure these keywords correctly and also all nfdb used in cluster are up.

Cluster health not 100% or shards remain unassigned state and data may stuck

Possible Reasons #1
One or more nodes are out of cluster
Steps to DiagnoseWe have to do rolling restart of nodes and find out the node which is out of cluster by command

curl nfdbip:port/_cat/nodes?v

Commands to validate

Then, do rolling restart

sudo su
Step 1:- curl -X PUT “10.206.96.82:7894/_cluster/settings” -H ‘Content-Type: application/json’ -d'{“transient”: {“cluster.routing.allocation.enable”: “none”}}’?master_timeout=3000s

Step 2:- curl -X POST “10.206.96.82:7894/_flush/synced”

Step 3:- Restart the NFDB Service. (service nfdb restart)

Step 4:- curl -X PUT “”10.206.96.82:7894/_cluster/settings” -H ‘Content-Type: application/json’ -d'{“transient”: {“cluster.routing.allocation.enable”: “all”}}’?master_timeout=3000s
restart nfdb

Possible Reasons #1Node Reaches a high JVM value
Steps to Diagnose

If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches. It will hurt performance, but it can save you from OOM (Out Of Memory).

Use command-

curl -XPOST ‘http://localhost:9200/_cache/clear’

Commands to validateIf a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches

Use command-

curl -XPOST ‘http://localhost:9200/_cache/clear’

Enable Nfagent flag

Possible Reasons #1nfagent not working
Steps to Diagnose & Command UsedIn cmon.env enable flag as 1 below is the key

CAV_MON_AGENT_OPTS=”-F 1″

Solution1 means enable 0 means disable
NFagent must work with cmon so make it as 1 (enable)


Grok parse error

Possible Reasons #1Filter file is not correct
Steps to Diagnose & Command UsedThere may be any indentation error in filter file or may be some human error
SolutionCheck and make sure Input.conf, filter.file.conf, and output.conf file is correct


Grok parse failure

Possible Reasons #1grok matcher in filter is not correct or all logs may not get parsed
Steps to Diagnose & Command UsedCorrect your grok in filter file or use correct parser for parsing different logs
SolutionCheck and make sure correct parser is used in filter file and there should not be any human error.


NF agent not able to send data from server

Possible Reasons #1Incorrect config file
Steps to Diagnose & Command UsedCheck the config files present in path /home/cavisson/monitors/nf/nfagent/config/conf.d.files

1-input.conf

2-filter.file.conf

3-output.conf

SolutionCheck and make sure these files are correct in respect of all the keywords and in parser is also correct in filter file.conf and output.conf must have NFDB host and port.

Possible Reasons #2Incorrect nf.env file and cmon.env
Steps to Diagnose & Command Used

Check nf.env file present in /home/cavisson/monitors/sys and check below keyword

OUTPUT_HOST1,OUTPUT_PORT,INDEX_PREFIX

Example:

SERVER=10.206.96.52

DC=Stress

ENV=Stress

INDEX_PREFIX=cavisson

# If it’s false only OUTPUT_HOST1 and OUTPUT_PORT1 is considered

#OUTPUT_MULTINODE=false

OUTPUT_HOST1=10.206.96.98

OUTPUT_PORT1=7894

AppName=work-NF

YML_CONFIG_NFDB=false

In cmon.env

check CAV_MON_AGENT_OPTS=”-F 1″

this keyword and Tier filed

Solution

OUTPUT_HOST1— this keyword must be filled with the NFDB host in which you want to dump data

OUTPUT_PORT— this keyword must be filled with the correct port of NFDB

INDEX_PREFIX— this keyword must be filled with index name which you want to see in
NetForestUI.

in cmon.env CAV_MON_AGENT_OPTS=”-F 1″

it should be 1 for enabling of nfagent and also define Tier value in it.


::Errors::NotFound: [404] {“error”:{“root_cause”:[{“type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”}],”type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”},”status”:404}

Possible Reasons #1This error comes in Agent side in nfagentplain.log is due to the keyword YML_CONFIG_NFDB= prsent in nf.env is not configured correctly
Steps to Diagnose & Command UsedYML_CONFIG_NFDB= This keyword is filled with two values either true or false
Solution
If it is true we have to dump the config.yml in NFDB
if it is false by default it will go through log path present in Input.conf filepath => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*”this is the log path where nf agent collects the logs for parsing.


Unable to reach host

Possible Reasons #1This error comes when NFDB is down
Steps to Diagnose & Command UsedWe have to verify NFDB is running or not .if not then check the required configuration of NFDB
SolutionWe have to verify NFDB is running or not it must be UP or in running state and check connectivity between NFDB and server


Data index not created

Possible Reasons #1

Either path present in Input.conf file doses not have read permission to user or path is not correct

path => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*”

Steps to Diagnose & Command UsedWe have to check the log path used in Input.conf file and user must be same and have read access
SolutionWe have to check the log path used in Input.conf file and user must be same and have read access