How do I diagnose and visualize high ping times to wifi router?

I'm seeing erratic and sometimes very long ping times to my wifi router that's just one hop away. Pinging 192.168.1.1 sometimes gives stretches of 400-800ms latencies.

There are plenty of things to try (firmware, router placement, AP channel, etc.), but I would like to attack this problem a bit more methodically:

  • First, how can I visualize the performance of my network?
  • Then, how can I benchmark the performance of a given configuration, so that I can compare reliably after making adjustments?

Solution 1:

This serverfault answer has good high-level guidance on what to do - so start with that. That last step is a real doozy though: presumably you (I mean, me) don't want to invest in dedicated hardware for this...

Below are some good tools, first for understanding connectivity health within the local wifi network, and then to an internet endpoint.

Wifi Tools

NetSpot (for mac)

It tracks the local WiFI APs and provides basic data like SNR, Channel, Signal Strength. It can also do a basic site survey for a physical space indicating strengths and interference. In the AP discovery mode, you can also chart signal strength over time, allowing you to test placements and adjust interference possibilities. enter image description hereenter image description hereenter image description here

Wifi Speed Test for Android

Very helpful. You'll run a simple python server on your machine and the app can test a few scenarios giving you realtime speed feedback.

enter image description here

Wifi Analyzer, another great android app, has a few valuable views of what AP wifi channels are active. Might be the best free tool for choosing AP channel without doing a lot of work.

iPerf

Well respected tool for understanding local network performance. You need two boxes, one as server, one as client. You can set up a number of parameters, run a test, and see the results for bandwidth and jitter. I perfer using it with the jPerf GUI for charting results and tweaking parameters.

brew install iperf
iperf -s # on server, next one on client
iperf -c 192.168.1.XXX -P 1 -i 1 -p 5001 -f m -t 60

enter image description here

Internet Connectivity Health

mtr (ping & traceroute combined)

Pings all your traceroute hops. Provides trend data. Crazy awesome.

brew install mtr
mtr 8.8.4.4

speedtest-cli

The CLI version of the common ookla speedtest.net thing. The project maintainer declares it's not consistent, but still, it's handy to try to gauge large differences.

wget -O speedtest-cli https://raw.github.com/sivel/speedtest-cli/master/speedtest_cli.py
chmod +x speedtest-cli
speedtest-cli --list | head # and chose a top server (sorted by distance)
speedtest-cli --server 2761 # re-use the same server

NPAD : Network Path and Application Diagnosis

Automatic diagnostic server for troubleshooting end-systems and last-mile network problems. After running a battery of tests, gives a Result Summary page like this. I recommend using this NPAD server redirect link to find the closest NPAD server (they're all over) and using that hostname for your tests.

  wget http://netspeed.usc.edu:8000/diag-client.c
  cc diag-client.c -o diag-client
# ./diag-client <server_name> <port> <target_RTT> <target_data_rate_in_MB/S>
  ./diag-client ps.psc.xsede.org 8001 30 5

enter image description here


My personal results:

I spent a good few hours doing all this, trying different things (switching from DD-WRT to Tomato firmware) and reading. Turns out it wasn't network layer and was good old RF interference, mostly from Bluetooth! I had my computer, a bluetooth mouse and keyboard within 5 feet of the router. (And old router still on 2.4Ghz where they clash.)

For this, I got the most out of Wifi Speed Test for Android, running that regularly while I moved things around in the apartment. Since it reports updates every 200ms or so, it clearly communicated when interference was dropping my packets.

I definitely recommend reading the Common Sources of Interference guide from Metageek. (They also make InSSIDer and other Wifi analysis tools that seem good.)

enter image description here

One tool I didn't have was a physical spectrum analysis meter. Phones and laptops can only detect Wifi APs, but can't pick up on interference from Bluetooth or other RF-based technologies. Metageek has some nice solutions in this space (Wi-Spy and inSSIDer Office) and hopefully we see more tools emerge like AirShark.

Solution 2:

As noted in my comment above: Tools commonly used to diagnose Wi-Fi issues can actually cause this problem. When scanning for Wi-Fi networks the radio has to go off channel, typically it tells the AP to buffer frames for it so it can 'sleep' then switches channels to scan.

Additionally, iOS and OS X since AirDrop was introduced, will take the Wi-Fi radio off channel to look for other AirDrop services and since Yosemite will periodically go off channel to support handoff.

Solution 3:

So I had these Wi-Fi ping fluctuations to the router too.

PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: icmp_seq=0 ttl=63 time=2.334 ms
64 bytes from 192.168.0.1: icmp_seq=1 ttl=63 time=1.813 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=63 time=2749.664 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=63 time=1748.912 ms
64 bytes from 192.168.0.1: icmp_seq=4 ttl=63 time=748.162 ms
64 bytes from 192.168.0.1: icmp_seq=5 ttl=63 time=1.796 ms
64 bytes from 192.168.0.1: icmp_seq=6 ttl=63 time=1.806 ms
64 bytes from 192.168.0.1: icmp_seq=7 ttl=63 time=1.991 ms
64 bytes from 192.168.0.1: icmp_seq=8 ttl=63 time=1.797 ms
64 bytes from 192.168.0.1: icmp_seq=9 ttl=63 time=1.832 ms
64 bytes from 192.168.0.1: icmp_seq=10 ttl=63 time=1.713 ms
64 bytes from 192.168.0.1: icmp_seq=11 ttl=63 time=1.819 ms
64 bytes from 192.168.0.1: icmp_seq=12 ttl=63 time=1.616 ms
64 bytes from 192.168.0.1: icmp_seq=13 ttl=63 time=1.748 ms
64 bytes from 192.168.0.1: icmp_seq=14 ttl=63 time=1.677 ms
64 bytes from 192.168.0.1: icmp_seq=15 ttl=63 time=3427.213 ms
64 bytes from 192.168.0.1: icmp_seq=16 ttl=63 time=2426.371 ms
64 bytes from 192.168.0.1: icmp_seq=17 ttl=63 time=1425.634 ms
64 bytes from 192.168.0.1: icmp_seq=18 ttl=63 time=424.834 ms
64 bytes from 192.168.0.1: icmp_seq=19 ttl=63 time=1.829 ms
64 bytes from 192.168.0.1: icmp_seq=20 ttl=63 time=1.691 ms
64 bytes from 192.168.0.1: icmp_seq=21 ttl=63 time=2.038 ms
64 bytes from 192.168.0.1: icmp_seq=22 ttl=63 time=1.679 ms
^C--- 192.168.0.1 ping statistics ---
23 packets transmitted, 23 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.616/564.346/3427.213/1015.102 ms

I switched the router (from TL-WR743ND to DIR-815), tried several Wi-Fi USB adapters (mostly TP-LINKs, though I think I had the issue with D-Link DWA-160 too), went from 2.5 GHz to 5GHz and scoured the channels. No luck, the problem persisted.

Till I noticed that when I do a network speed test or run a bittorrent client the ping is all right. It only fluctuates when the network is idle.

Might be a Windows 7 issue or a thing with my TP-LINK adapters, but when I give a bit of a load on the Wi-Fi the fluctuation vanishes and the network works all right.

So far I've made a little Rust program to keep my Wi-Fi network up.

// Need a constant wifi load in order not to have the ping drops.
fn wifi_load() {
  // This *might* be useful if the router suddenly supports Keep-Alive.
  // Not the case with DIR-815 though, we'll keep making new connections to it.
  let config = hyper::client::pool::Config {max_idle: 1};

  let client = hyper::client::Client::with_pool_config (config);
  loop {
    let url = "http://192.168.0.1/css/init.css";
    if let Err (err) = client.get (url) .send() {
      log! ("wifi_load] Error fetching {}: {}", url, err);
      sleep (Duration::from_secs (9));}
    sleep (Duration::from_millis (100));}}