Network monitoring [duplicate]

Possible Duplicate:
What tool do you use to monitor your servers?

What software would you recommend to monitor a network? We have a main server which acts as a DNS among other services. And we would like to monitor network activity: what protocols are being used, bandwidth, etc.

Kind of a Big Brother thing, to know when a user tries to login to his personal mail account on GMail, Hotmail, etc. or uses an external IM account, things which are not permitted under the company's rules. And if possible, block these access (or being able to know about it, in order to take the correspondent disciplinary actions).

I've read Nagios is a monitoring service, is this the solution we are looking for? What other open source alternatives are there?


Solution 1:

Nagios is a solid open source solution, the plugin architecture means that nagios basically provides a framework for monitoring to occur, and then you can plug in the exact monitors you need. Since nagios is fairly popular, there are a ton of monitoring modules already in existence.

Nagios is mostly a real time monitor, not a reporter. I know there are ways to pass the real time nagios data up to a reporting app like cacti or munin that produce some lovely graphs.

Solution 2:

It depends on how big your network is, and what you're trying to monitor. For small to medium networks, Nagios seems to be the platform of choice.

Once you get over a certain size, though, 'monitoring' gets split up into a variety of different functions, which may or may not be handled by the same tools. The three that I was taught are:

  • Fault management
  • Performance management
  • Forensics/correlation

Fault management is catching any events in the environment that require immediate action to fix. Link failures, hardware failures, loss of WAN circuits, etc. This is normally tied into your alerting system. I've heard Nagios does this quite well.

Performance management covers things that aren't an immediate issue, but could become so unless they're given attention. This basically covers anything to do with monitoring utilisation and trending it. LAN/WAN bandwidth, router/switch CPU, error and discard counters on interfaces. This is the kind of stuff you look at when you're planning your purchases and projects for the coming year; it tells you what parts of the network need attention, and which are ticking along happily. I'm a fan of Cacti; it handles all data gathering and presentation in one package, with built in support for SNMP polling of devices.

Forensics/correlation is for those cases where a one off incident has occurred and been resolved, and you need to look at historical data. This can either be to get a better idea of what actually happened and what the consequences were, or to look for instances of similar failures in the past. Either way, it generally requires a single repository of as much log data as you can feasibly retain, indexed and readily searchable. Splunk is absolutely fantastic in this regard, even the free edition; in addition, you can even get your server logs into Splunk. As long as everything is NTP synced, you have one repository which show you what your applications, servers and network infrastructure saw at various points during an incident.

The other things you're looking to audit are more covered in terms of network security, than network monitoring. For example, monitoring/restricting user browsing is easily accomplished through the use of a properly configured proxy. Users trying to connect to external IM services should be blocked by your Internet firewall; again, firewall logs can be exported to an analyser and reports run looking for suspicious traffic patterns. In fact, if you can, try and avoid letting your user workstations access the Internet directly, by ensuring that the Internet is not routeable from inside your LAN. This forces all internet traffic to go through a proxy of your choosing, ensuring that you have full control of all inbound and outbound traffic.

Solution 3:

To answer the part of your question about monitoring activity (as opposed to checking whether your servers are running) - any decent enterprise firewall should include a real-time view of flows passing through it, as well as retrospective reports on those flows.

At my last job we supplied Stonesoft firewalls to customers and the management interface for those provided both of those features.