Nagios "CRITICAL - Socket timeout after 10 seconds" problems with service and host Checks
got a little bit of a problem with a Nagios system we are using in our office, which has only recently started appearing.
what i would like to know really is the best solution to resolve this problem as ive done a bit of reading on it and there seems to be lots of different ways to solve it..
basically at random points throughout the day and on random hosts / services we will get a Critical warning flagged up that something is not behaving as it should, when we investigate 9 times out of 10 we end up with this as an error message.
"SERVICE ALERT: SERVERNAME ;NSClient++ Version;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds"
indicating the service or host has timed out, where do i go about setting the timeouts so this stops ? ive read that some of the plugin timesouts are as low as 10 seconds...
thanks Kris
Solution 1:
Generally with any service you will get these sometimes if the server is too busy to respond, hickup in network etc. You might try to see if the server in under load when you get these alerts.
I think the main thing you want to look at is the max_check_attempts
directive associated with the service or the service's template so you don't get an alert until the check has a gone into failed/critical state a couple of times in a row. You can also adjust the timeout value of the check_nt
plugin to with the -t
switch:
-t, --timeout=INTEGER
Seconds before connection attempt times out (default: 10)
Solution 2:
I would also recommend checking the NSClient on the monitored host.