How best to monitor logstash?
I've seen this question on the mailing list a few times but haven't had a satisfactory answer.
How best to monitor that the pipeline isn't stuck? Clients -> logstash -> elasticsearch.
Logstash and especially elasticsearch are prone to resource starvation. They are both fantastic at picking up where they left off but how, exactly, are people watching their watchers?
Opinions welcome.
Solution 1:
Personally i actually check that redis is still dequeuing on the central logging host, which is upstream of LS+ES.
i.e: redis-cli llen logstash
is less than some fixed number.
This may not indicate that logs are appearing in redis at all though, but that could be checked too i guess.
Something like checking that redis-cli info | grep total_commands_processed
keeps increasing, maybe ?
Solution 2:
I use zabbix in my environment, but I suppose this method could work in other setups as well. I have configured the following command that zabbix is allowed to use:
UserParameter=elasticsearch.commits,/usr/bin/curl -s 'localhost:9200/_cat/count?v' | /bin/sed -n '2p' | /bin/awk '{print $3}'
This will return the number of elasticsearch records committed total. So I take this value and divide by the number of seconds since I took the last sample (I check every minute), if this number drops below an arbitrary limit I can alert off it. I also use zabbix to check to see if the logstash PID has died, and alert off that also, and run the following command:
UserParameter=elasticsearch.health,/usr/bin/curl -s 'http://localhost:9200/_cluster/health?pretty=true' | /bin/sed -n '3p' | /bin/awk -F'\"' '{print $4}' | /bin/sed s/yellow/0/ | /bin/sed s/green/0/ | /bin/sed s/red/1/
This will return 1 if cluster health has gone red (yellow and green are okay), which I can also alert off.