Is it reasonable to use Nagios to check that a service is NOT available?

Suppose I have a server with a private interface and a public interface. Public might have things like HTTP(S) servers, private might have MySQL and SSH.

Obviously Nagios is useful to check that the services are running on their respective interfaces. But is it a good idea to build checks that explicitly test that the MySQL and SSH ports are not open on the public interface? The idea is to catch inadvertent misconfigurations that have opened up services that should be private, and alert appropriately.

Part of me has the idea that this wouldn't scale terribly well -- imagine there is an iptables DROP rule, for example, the check would have to wait until the check timeout exceeded before it can complete and move on. But that timeout would have to be sufficiently high to be able to differentiate a blocked service from an open one that's really bogged down.

Is this a practical idea? Is Nagios the right tool? I haven't even looked into the feasibility of negating the result from the TCP check plugins, but I'm sure it's doable...


Solution 1:

Yes, of course. The job of a monitoring system is to ensure that the business requirements are currently being met by the IT infrastructure, whatever those requirements are.

My gut feeling is that there's no easy limit (well, 65535) to the number of ports you're monitoring to ensure that they don't suddenly become open, and that the best way to achieve this control is tight source control plus strong, aggressive file system monitoring (eg, tripwire) on the server.

But if there are certain ports that it's absolutely business-critical are never exposed, then yes, by all means emplace a specific check for that. You may want to look into the NAGIOS negate plugin, which ships with most major distributions, and is used to do exactly what you suggest.

Solution 2:

You can combine any check with the negate plugin to invert the check logic. You can redefine CRIT, WARN, UNKNOWN, and OK to other states, for example. See the --help output for more info.

If you're concerned about DROP policies increasing the check time, you can just shorten the timeout. For a check like this, you probably don't need to check every 5 minutes either. We have some similar checks that run hourly.