Troubleshooting a "slow" network

tcpdump and wireshark are your friends.

I find that watching packets on the wire of a 'slow' network vs a 'good' network is usually what pinpoints a problem.

There are many types of 'slow'.

You can track latency to local and internet sites using a tool like SmokePing. (SmokePing can be configured to track ICMP latency as well as service latency from TCP services)

Your switches should track broadcast packets vs unicast packets. Graph that ratio.

I also like to monitor traceroutes (checking domain names of ISP hops between myself 'important' sites).

I hope these comments help.


It is hard to give specific answers since 90% of this job is experience which teaches you where to look for which kind of problem, and the other 90% is knowing where to look on Google to get hints of where to start.

I usually try the paper-bag stuff like getting the customer to demonstrate the problem (mostly to rule out finger-problems and any issues the customer may have describing his problem), then trying to duplicate the problem on another computer. Doing that often gives you insight into where to look.

Don't forget the corrective problem of a reboot, especially for Windows systems, even today. It used to be like this so much that I would ask people "Have you rebooted? Well try that and let me know if the problem persists" -- this fixed a very large percentage of the issues I was asked about.

There's frequently also low-hanging fruit in DNS resolution problems and basic connectivity (ACLs on routers, air-gaps in the network, pings/traceroutes/mtrs to remote sites, etc).

For services you have direct control over, running nagios or something to ensure the service is actually running can frequently trigger you to fix problems before customers tell you about them. You probably also want to be running stats gathering, either directly through munin or something, or via SNMP to something like Cacti.

I usually try to have Cacti running against at least all my core switches and firewalls; where possible, I run Cacti against everything I can. In these cases I am usually looking for things like port error counts or excessive traffic. Firewall graphs from some devices can show you CPU usage and concurrent sessions; you'll get to learn at what thresholds your firewall device starts to have issues.

Your firewall may be able to log to a syslog device; if so, log everything you can and look through those for hints. This will be easier if you run something like syslog-ng or rsyslog or splunk that lets you divide your logs somewhat rather than dealing with one monolithic file.

I also try to run nfsen against at least the inside of my firewall, and the uplink to the internet provider where possible. This lets you go back in time to look at sessions to see who was doing what; this sometimes can catch interesting behaviors.


Here are a couple of useful tools for troubleshooting latency and other network issues:

  • the OSI model - start from the bottom and work your way up
  • ping - check your RTT (i.e. latency)
  • HTTP ping - usefull if your firewall blocks normal ICMP's
  • ping -r 9 - useful for identifying asymmetric routing situations
  • traceroute - how are my packets getting there and how are the routers along the way responding? Be aware that routers often process these packets at a low priority, so real performance may be better.
  • Wireshark - takes some expertise, but your can't get much lower-level
  • SpeedGuide.net TCP/IP Analyzer - check your PC's TCP settings
  • SG TCP Optimizer - (Windows only) suggest ways to optimize your NIC settings
  • IP Chicken - what is your source (non-NAT'd) ip address?
  • http://downforeveryoneorjustme.com/ - maybe it is you...
  • Bandwidth speed test - check your download / upload speeds
  • Network tools - run tools/tests from outside your network
  • check your network ports for errors/CRC's/etc. -
  • check your network for over utilization (bandwidth monitors) & broadcast storms
  • check for unicast flooding - use wireshark and monitor for unicast traffic that is not destined for your workstation.
  • verify your spanning-tree root bridge is placed properly