Throttle nagios alerts if host loses connectivity
We use nagios to monitor our server farm, and generally it works great. From time to time, though, the host where nagios runs loses connectivity for a couple of minutes, which makes nagios believe that all servers and services it monitors are down. The result is hundreds of alert mails, shortly followed by hundreds of recovery mails.
Is there any way to configure nagios in such a way that it tests its own connectivity before releasing an avalanche of alert mails?
Yes, you can set parents and childs. If a parent is down, no notification about the child is given. You do need to set the timings properly though (in generic_service and generic_host or whatever templates you use), because when the services are no longer available, it needs to have decideded the parent is down before it would send notifications out for those services.
What I did, is this:
# ISP gateway (first in traceroute)
define host {
host_name kpn-gateway
alias KPN Gateway
address 1.2.3.4
use generic-host
notification_period never
parents experia
}
# gateway in datacenter
define host {
host_name duocast-gateway
alias Duocast gateway
address 5.6.7.8
use generic-host
parents kpn-gateway
contact_groups bla
}
# one of the hosts in datacenter.
define host {
host_name brick
alias host.example.com
address a.b.c.d
use generic-linux-host
parents duocast-gateway
contact_groups geborsteldstaal
}