Is Nagios "monitoring" over WAN ideal?
Monitoring over a WAN is possible, but is generally not ideal. This is because if the WAN link goes down or blips all checks will fail and you are blind to what is happening in the remote location. You also have increased latency making it less useful for LAN View performance measurements. That being said if you are going this way you probably want to set up dependencies so you don't get flooded with alerts when the WAN link has issues.
The most common way I have seen communication between a monitoring system and its monitored services is to have a site-to-site VPN tunnel. Then communication is no different from the local network. Also, Nagios is often Pull based (although it doesn't have to be). So Nagios contacts the services and servers it monitors, not the other way around.
Lastly, a more ideal solution is to use a distributed monitoring setup, with Nagios one option is described in http://nagios.sourceforge.net/docs/3_0/distributed.html .
It kind of depends what you are going to be monitoring over the wan. For the most part if you are only doing ping checks, services checks, disk checks etc and stick to nagios's default 5 min checking time i cant see it causing you an issue.
Again, depending on what you are checking depends on what it is going to talk over. If you are checking windows hosts you can just use WMI queries and not even need an agent running on the box.
This is certainly possible, via several different methods.
If the "distributed setup" is out of the question, then you need to do at least one of the following:
- Have every box at the remote site push check results to Nagios (see NSCA)
- Poke firewall holes so that Nagios can reach every box at every remote site
- Designate a single box at each site to be a sort of "Nagios proxy"
I would suggest #3, because it requires the least firewall hole-poking, and also simplifies configuation. It's sort of a slimmed-down version of the distributed setup, in that it doesn't require a full Nagios instance at each site.
To do this, you can set up NRPE (or use check_by_ssh) and have this "proxy" run all of the other checks against the other hosts on the network. This has the added benefit of the performance data that you get back being relative to the proxy, so it won't be affected by WAN lag.
Also, you can then use parent/child setups to make every host at the remote site a child of its proxy, to reduce false-positive notifications. You might also want to make all of the services dependent on a check_nrpe (or check_ssh) service of the proxy. See the network reachability docs for more info.
No matter which method you go with, it's very important that you adjust default timeouts appropriately, to account for the added lag of going across the WAN links.