How to measure and minimize UDP packet loss

One of the main culprits of UDP loss, especially in LANs is buffer overflows. These can happen in the switch, or in the sending or receiving servers. One mechanism you can use on Linux to verify packet loss is to run the following command:

watch -n 1 -d 'cat /proc/net/udp'

This will show an output similar to the following, where the last column is the number of packets dropped:

Every 1.0s: cat /proc/net/udp                                                                                                                                 Mon Sep 28 15:01:00 2015

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
11362: 00000000:3443 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 18224 2 ffff880808040000 0
19543: 00000000:D438 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3809742 2 ffff8808060c8400 0
30819: 00000000:0044 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 12644 2 ffff88100f2b0400 0

You can then try a number of different tricks to try and address these (again using Linux as an example):

  • ensure that the app consuming the data has enough CPU available,
  • ensure that the threads doing the I/O are as close to the network device as possible
  • ensure that the udp buffer sizes are all large enough to accomodate the data (again, you can watch the watch command's output to see if the tx_queue or rx_queue columns grow), and then increase the udp buffers using sudo sysctl -w 'net/ipv4/udp_mem=xxx yyy zzzz', or sysctl -w 'net/core/rmem_default=????', or sysctl -w 'net/core/wmem_default=????' (note, the xxx,yyyy,zzz are defined here

In an unashamed self promotion, I've created a product called Pontus Vision Thread Manager that continuously tunes this automatically.


What could be the general reason for UDP packet loss

Congestion (too many packets) with lack of QOS (random packets dropped, VoIP not handled with priority) and / or faulty equipment (line quality etc.) For the first, get QOS capable equipment, for the latter check the lines (hardware, switches, whatever) for being bad.

For an internet connection, you need QOS routers on both ends - which you won't have (unless VoIP is offered by your provider, then he likely has the infrastructure in place). That said, since your down channel is typically a lot bigger than the up channel, a local router prioritizing only the down channel is normally "good enough".

Bad line quality is a hard problem to handle, though.