A strategy for network monitoring to improve network health?

The goal

is to find out which parts of network should be upgraded / reorganized to improve network health (reliability and performance).

What are the methods to monitor network traffic, detect bottlenecks, latency, packet loss? Can i do it from 1 point in network or do i have to plug my computer in specific network places?

What would be the best strategy to check networks load, detect bottleneck etc?

Now there are many tools, but is installing it and running anywhere enought for ntop, cacti? I`v googled around and found something that says

ntop is a network traffic probe that shows the network usage, similar to what the popular top Unix command does.

But I cannot use it until i understand how to gather the data, so i`m asking this question.

Background

Our network is totally built on cheap network switches but the network has pretty much expanded involving Computers, Network cameras and some other hardware with network interfaces. All of network devices are 100mbit, none are 1gbit.

  • Cameras get recorded on few PCs at distant locations and viewed on other PCs at distant locations.
  • Computers are no more than 50, but few are pretty distant (large area here), some are 300meters away etc. There is a radio link and fibre optic connceting those places. These computers usually connect to database application.

Network map

Network diagram

  • Red lines are Fibre optic
  • Black lines are Ethernet cables
  • Boxes are physical locations

Solution 1:

In an environment that supports it, we've found NetFlow Analyser to be a very useful tool for capacity planning, identifying bottlenecks, and monitoring the health of the network. You can even use it to check if backups are operating as expected (did x amount of data flow across Y link between hours A and B?) or to monitor disk performance on an iSCSI network (tap into the ports on your storage controllers and monitor throughput). However, it requires switch support to operate correctly, which you likely don't have given your comment about cheap switches.

Based on what you've posted I'm assuming your main driver for this is that the video viewing performance is a problem? You didn't mention whether viewing/serving the video is isolated to a limited number of machines, or whether it could take place between potentially any of the machines on the network. The answer to this would completely change the approach you need to take to this problem. Do the green lines indicate video flow?

Your diagram shows 12 buildings and your description hints that they're all in close proximity. What kind of environment are we looking at here? I'm guessing a school, college or hospital?

My core advice here, given what you've posted, is kinda tangential to your request - Take a look at how much you're paid per hour and how busy you are. Look at the cost of purchasing a few Cisco/Juniper 24-port Gigabit switches for the core network at head office. Chances are, it's more cost effective and a better use of your time to upgrade the core network, than diagnose the existing obsolete infrastructure and attempt to tweak it.

Solution 2:

Network monitoring is quite a broad term, Netflow/sFlow could be used to monitor types of traffic passing through various devices. Ntop provides this functionality as does some network gear. Network latency monitoring could be done using something like smokeping (http://oss.oetiker.ch/smokeping/). Hosts would run it at various ends of the network links, they then ping each other showing the response times and/or packet loss.

Also, cacti could be used to gather raw stats from your network gear that supports it (traffic rates + actually frame errors and what not)

Hope that helps some.

Solution 3:

When you say you have cheap switches : if you have unmanaged switches (no SNMP or other instrumentation) then you are out of luck. All you can do (maybe) is put monitoring on your gateway(s) and WAN links to measure those. Internal stuff - you can't put together an actual monitoring solution with taps or laptop sniffers, those are generally good for troubleshooting specific problems in one place.