Troubleshoot for NetForest
UI is not opening or stuck in login page of NFUI |
Possible Reasons #1 | Netforest.yml is not correct |
Steps to Diagnose & Command Used | Check keywords used in Netforest.yml like server.port:,server.host:,nfdb.url:,nfdb.envs:,NetForest.schedule_alert: Example:server.port: 8000server.host: “10.10.30.123”nfdb.url: “http://10.10.30.123:9201″nfdb.envs: {“prod”:”http://10.10.30.123:9201″}NetForest.schedule_alert: “10.10.30.123” |
Solution | Check and make sure these keywords are configured correctly |
Possible Reasons #2 | Port used in NFUI is not free |
Steps to Diagnose & Command Used | check keywords used in Netforest.yml like server.port: example:server.port: 8000use command :netstat -natp | grep 8000it must be free |
Solution | Port used for opening of NFUI must be free |
Possible Reasons #3 | When nfui and nfdb port mismatch |
Steps to Diagnose & Command Used | Check keywords used in nfdb.yml like server.port:,server.host: example:network.host: 10.10.30.123http.port: 9201 |
Solution | Check and configure keywords correctly and also match the nfdb url and port in both nfdb.yml and NetForest.yml |
Possible Reasons #4 | Cluster may not be in stable state (if there is more than one db in cluster state) |
Steps to Diagnose & Command Used | Check keywords used in nfdb.yml like transport.tcp.port: ,discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”]Example:transport.tcp.port: 7895discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”] |
Solution | Sometimes NFUI may not open due to cluster configuration, so configure these keywords present in nfdb.yml correctly |
Possible Reasons #5 | Elastic search Request time out issue |
Steps to Diagnose & Command Used | 1. Issue should come in GUI while searching
2. Check keyword related to Request time out in Netforest.yml, Keyword for this elastic search.requestTimeout Example elasticsearch.requestTimeout: 300000 |
Solution | Increase Timeout value for Elastic search Request time |
For integration purpose of NetForest with ND |
Possible Reasons #1 | Check this keyword NetForest.integrate_nd: present in NetForest.yml may be it is not filled or not filled correctly |
Steps to Diagnose & Command Used | Fill keyword NetForest.integrate_nd: with ND machine IP port Example:NetForest.integrate_nd: {“host”:”10.10.50.17″,”port”:”80″, “protocol”:”http”} |
Solution | Fill this keyword with ND ip and Port which you want to integrate |
NFBD Configuration Issue |
Possible Reasons #1 | nfdb.yml is not correct |
Steps to Diagnose & Command Used | Check keyword used in nfdb.yml like network.host: ,http.port: ,path.dataeg: network.host: 72.52.96.138http.port: 7894path.data: /pgdata/netforest/NFDB |
Solution | Make sure these keywords should be filled correctly |
Possible Reasons #2 | Due to low vm.max_map_count (in case of shell) |
Steps to Diagnose & Command Used | Check vm.max_map_count and increase it by using command
sudo sysctl -w vm.max_map_count=262144 |
Solution | sudo sysctl -w vm.max_map_count=262144 |
Possible Reasons #3 | Due to low space for java runtime process or garbage collector |
Steps to Diagnose & Command Used | Check jvm.options present in config file of nfdb.yml |
Solution | Increase two JVM options present in config file of NFDB
-Xms1g(max we can give 30 gb) -Xmx1g |
Possible Reasons #4 | Port used for nfdb is not free or nfdb not working |
Steps to Diagnose & Command Used | Check nfdb.log file present in nfdb (build) directory or use command journalctl -u nfdb -f this show the error like port is already in use |
Solution | Use free port in nfdb.yml and restart NFDB |
Possible Reasons #5 | When data is not coming from specific path |
Steps to Diagnose & Command Used | Check path.data: /path/to/data keyword |
Solution |
Possible Reasons #6 | Log query when circuit breaker trips |
Steps to Diagnose & Command Used |
Check the configuration in mapping.json “breakers”: { “request”: { “limit_size_in_bytes”: 10187558092, “limit_size”: “9.4gb”, “estimated_size_in_bytes”: 11343200256, “estimated_size”: “10.5gb”, “overhead”: 1.0, “tripped”: 250}, |
Solution | Make sure configuration is correct. |
NFDB status is red |
Possible Reasons #1 | NFDB is out of cluster |
Steps to Diagnose & Command Used | Check cluster health:
Command: -curl -XGET localhost:9200/_cluster/health?pretty |
Solution | In this case, bring all the nodes in a cluster and then open UI until cluster health reached to 100%. Command as below.
curl -XGET localhost:9200/_cat/health?pretty Shards are unassigned. Wait till the shard got allocated. |
Error in cluster or master not discovered |
Possible Reasons #1 | nfdb.yml is not correct |
Steps to Diagnose & Command Used |
Check keyword used in nfbd.yml like
network.host: ,http.port: ,path.data:,add keyword transport.tcp.port: in nfdb.yml and use tcp port in all nfdb.yml in all nfdb used incluster example:transport.tcp.port: 7895Check keyword discovery.zen.ping.unicast.hosts: present in nfdb.yml and insert host and tcp port in all nfdb.ymlExample:discovery.zen.ping.unicast.hosts: [“10.10.30.123:7895”, “10.10.30.73:7895”] |
Solution | Make sure to configure these keywords correctly and also all nfdb used in cluster are up. |
Possible Reasons #2 | Master not discovered |
Steps to Diagnose & Command Used | Use command curl nfdbhost:port/_cluster/health?pretty
This show error master not discovered then add keyword node.master: true in nfdb.yml of that nfdb which you want to be as a master Example: |
Solution | Make sure to configure these keywords correctly and also all nfdb used in cluster are up. |
Cluster health not 100% or shards remain unassigned state and data may stuck |
Possible Reasons #1 |
One or more nodes are out of cluster
|
Steps to Diagnose | We have to do rolling restart of nodes and find out the node which is out of cluster by command
curl nfdbip:port/_cat/nodes?v |
Commands to validate |
Then, do rolling restart sudo su Step 2:- curl -X POST “10.206.96.82:7894/_flush/synced” Step 3:- Restart the NFDB Service. (service nfdb restart) Step 4:- curl -X PUT “”10.206.96.82:7894/_cluster/settings” -H ‘Content-Type: application/json’ -d'{“transient”: {“cluster.routing.allocation.enable”: “all”}}’?master_timeout=3000s |
Possible Reasons #1 | Node Reaches a high JVM value |
Steps to Diagnose |
If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches. It will hurt performance, but it can save you from OOM (Out Of Memory). Use command- curl -XPOST ‘http://localhost:9200/_cache/clear’ |
Commands to validate | If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches
Use command- curl -XPOST ‘http://localhost:9200/_cache/clear’ |
Enable Nfagent flag |
Possible Reasons #1 | nfagent not working |
Steps to Diagnose & Command Used | In cmon.env enable flag as 1 below is the key
CAV_MON_AGENT_OPTS=”-F 1″ |
Solution | 1 means enable 0 means disable NFagent must work with cmon so make it as 1 (enable) |
Grok parse error |
Possible Reasons #1 | Filter file is not correct |
Steps to Diagnose & Command Used | There may be any indentation error in filter file or may be some human error |
Solution | Check and make sure Input.conf, filter.file.conf, and output.conf file is correct |
Grok parse failure |
Possible Reasons #1 | grok matcher in filter is not correct or all logs may not get parsed |
Steps to Diagnose & Command Used | Correct your grok in filter file or use correct parser for parsing different logs |
Solution | Check and make sure correct parser is used in filter file and there should not be any human error. |
NF agent not able to send data from server |
Possible Reasons #1 | Incorrect config file |
Steps to Diagnose & Command Used | Check the config files present in path /home/cavisson/monitors/nf/nfagent/config/conf.d.files
1-input.conf 2-filter.file.conf 3-output.conf |
Solution | Check and make sure these files are correct in respect of all the keywords and in parser is also correct in filter file.conf and output.conf must have NFDB host and port. |
Possible Reasons #2 | Incorrect nf.env file and cmon.env |
Steps to Diagnose & Command Used |
Check nf.env file present in /home/cavisson/monitors/sys and check below keyword OUTPUT_HOST1,OUTPUT_PORT,INDEX_PREFIX Example: SERVER=10.206.96.52 DC=Stress ENV=Stress INDEX_PREFIX=cavisson # If it’s false only OUTPUT_HOST1 and OUTPUT_PORT1 is considered #OUTPUT_MULTINODE=false OUTPUT_HOST1=10.206.96.98 OUTPUT_PORT1=7894 AppName=work-NF YML_CONFIG_NFDB=false In cmon.env check CAV_MON_AGENT_OPTS=”-F 1″ this keyword and Tier filed |
Solution |
OUTPUT_HOST1— this keyword must be filled with the NFDB host in which you want to dump data OUTPUT_PORT— this keyword must be filled with the correct port of NFDB INDEX_PREFIX— this keyword must be filled with index name which you want to see in in cmon.env CAV_MON_AGENT_OPTS=”-F 1″ it should be 1 for enabling of nfagent and also define Tier value in it. |
::Errors::NotFound: [404] {“error”:{“root_cause”:[{“type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”}],”type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_or_alias”,”resource.id”:”nfagent_config_yml”,”index_uuid”:”_na_”,”index”:”nfagent_config_yml”},”status”:404} |
Possible Reasons #1 | This error comes in Agent side in nfagentplain.log is due to the keyword YML_CONFIG_NFDB= prsent in nf.env is not configured correctly |
Steps to Diagnose & Command Used | YML_CONFIG_NFDB= This keyword is filled with two values either true or false |
Solution |
If it is true we have to dump the config.yml in NFDB
if it is false by default it will go through log path present in Input.conf filepath => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*”this is the log path where nf agent collects the logs for parsing. |
Unable to reach host |
Possible Reasons #1 | This error comes when NFDB is down |
Steps to Diagnose & Command Used | We have to verify NFDB is running or not .if not then check the required configuration of NFDB |
Solution | We have to verify NFDB is running or not it must be UP or in running state and check connectivity between NFDB and server |
Data index not created |
Possible Reasons #1 |
Either path present in Input.conf file doses not have read permission to user or path is not correct path => “/home/cavisson/NDE_CLIENT/apps/apache-tomcat-7.0.59/logs/*access_log*” |
Steps to Diagnose & Command Used | We have to check the log path used in Input.conf file and user must be same and have read access |
Solution | We have to check the log path used in Input.conf file and user must be same and have read access |