Extreme UDP packet loss at 300Mbit (14%), but TCP > 800Mbit w/o retransmits

I have a linux box I use as the iperf3 client, testing 2 identically equipped Windows 2012 R2 server boxes with Broadcom BCM5721, 1Gb adapters (2 ports, but only 1 used for the test). All machines are connected via a single 1Gb switch.

Testing UDP at e.g. 300Mbit

iperf3 -uZVc 192.168.30.161 -b300m -t5 --get-server-output -l8192

results in the loss of 14% of all packets sent (for the other server box with exact same hardware, but older NIC drivers, loss is around 2%), but loss occurs even at 50Mbit, albeit less severely. TCP performance using equivalent settings:

iperf3 -ZVc 192.168.30.161 -t5 --get-server-output -l8192

yields transmission speeds north of 800Mbit, with no reported retransmissions.

The server is always started up using the following options:

iperf3 -sB192.168.30.161

Who's to blame?

  1. The linux client box (hardware? drivers? settings?)? Edit: I just ran the test from one Windows server box to the other and the UDP packet loss at 300Mbit was even higher, at 22%
  2. The windows server boxes (hardware? driver? settings?)?
  3. The (single) switch that connects all test machines?
  4. Cables?

Edit:

Now I tried the other direction: Windows -> Linux. Result: Packet loss always 0, while throughput maxes out at around

  • 840Mbit for -l8192, i.e. fragmented IP packets
  • 250Mbit for -l1472, unfragmented IP packets

I guess flow control caps throughput, and prevents packet loss. Especially the latter, unfragmented figure is nowhere near TCP throughput (unfragmented TCP yields similar figures to fragmented TCP), but it's an infinitely huge improvement over Linux -> Windows in terms of packet loss.

And how to find out?

I did follow the usual advice for driver settings on the server to maximize performance and tried to enable/disable/maximize/minimize/change

  • Interrupt Moderation
  • Flow Control
  • Receive Buffers
  • RSS
  • Wake-on-LAN

All offload features are enabled.

Edit I also tried to enable/disable

  • Ethernet@Wirespeed
  • The various offload features
  • Priority&VLAN

With similar loss rates.


The full output of a UDP run:

$ iperf3 -uZVc 192.168.30.161 -b300m -t5 --get-server-output -l8192
iperf 3.0.7
Linux mybox 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux
Time: Wed, 13 May 2015 13:10:39 GMT
Connecting to host 192.168.30.161, port 5201   
      Cookie: mybox.1431522639.098587.3451f174
[  4] local 192.168.30.202 port 50851 connected to 192.168.30.161 port 5201
Starting Test: protocol: UDP, 1 streams, 8192 byte blocks, omitting 0 seconds, 5 second test
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-1.00   sec  33.3 MBytes   279 Mbits/sec  4262
[  4]   1.00-2.00   sec  35.8 MBytes   300 Mbits/sec  4577
[  4]   2.00-3.00   sec  35.8 MBytes   300 Mbits/sec  4578
[  4]   3.00-4.00   sec  35.8 MBytes   300 Mbits/sec  4578
[  4]   4.00-5.00   sec  35.8 MBytes   300 Mbits/sec  4577
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-5.00   sec   176 MBytes   296 Mbits/sec  0.053 ms  3216/22571 (14%)
[  4] Sent 22571 datagrams
CPU Utilization: local/sender 4.7% (0.4%u/4.3%s), remote/receiver 1.7% (0.8%u/0.9%s)

Server output:
-----------------------------------------------------------
Accepted connection from 192.168.30.202, port 44770
[  5] local 192.168.30.161 port 5201 connected to 192.168.30.202 port 50851
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-1.01   sec  27.2 MBytes   226 Mbits/sec  0.043 ms  781/4261 (18%)
[  5]   1.01-2.01   sec  30.0 MBytes   252 Mbits/sec  0.058 ms  734/4577 (16%)
[  5]   2.01-3.01   sec  29.0 MBytes   243 Mbits/sec  0.045 ms  870/4578 (19%)
[  5]   3.01-4.01   sec  32.1 MBytes   269 Mbits/sec  0.037 ms  469/4579 (10%)
[  5]   4.01-5.01   sec  32.9 MBytes   276 Mbits/sec  0.053 ms  362/4576 (7.9%)

TCP run:

$ iperf3 -ZVc 192.168.30.161 -t5 --get-server-output -l8192
iperf 3.0.7
Linux mybox 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux
Time: Wed, 13 May 2015 13:13:53 GMT
Connecting to host 192.168.30.161, port 5201   
      Cookie: mybox.1431522833.505583.4078fcc1
      TCP MSS: 1448 (default)
[  4] local 192.168.30.202 port 44782 connected to 192.168.30.161 port 5201
Starting Test: protocol: TCP, 1 streams, 8192 byte blocks, omitting 0 seconds, 5 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   109 MBytes   910 Mbits/sec    0   91.9 KBytes       
[  4]   1.00-2.00   sec  97.3 MBytes   816 Mbits/sec    0   91.9 KBytes       
[  4]   2.00-3.00   sec  97.5 MBytes   818 Mbits/sec    0   91.9 KBytes       
[  4]   3.00-4.00   sec  98.0 MBytes   822 Mbits/sec    0   91.9 KBytes       
[  4]   4.00-5.00   sec  97.6 MBytes   819 Mbits/sec    0   91.9 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec   499 MBytes   837 Mbits/sec    0             sender
[  4]   0.00-5.00   sec   498 MBytes   836 Mbits/sec                  receiver
CPU Utilization: local/sender 3.5% (0.5%u/3.0%s), remote/receiver 4.5% (2.0%u/2.5%s)

Server output:
-----------------------------------------------------------
Accepted connection from 192.168.30.202, port 44781
[  5] local 192.168.30.161 port 5201 connected to 192.168.30.202 port 44782
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   105 MBytes   878 Mbits/sec                  
[  5]   1.00-2.00   sec  97.5 MBytes   818 Mbits/sec                  
[  5]   2.00-3.00   sec  97.6 MBytes   819 Mbits/sec                  
[  5]   3.00-4.00   sec  97.8 MBytes   820 Mbits/sec                  
[  5]   4.00-5.00   sec  97.7 MBytes   820 Mbits/sec                  

There is no problem. This is normal and expected behaviour.

The reason for the packet loss is that UDP doesn't have any congestion control. In tcp when congestion control algorithms kick in, it will tell the transmit end to slow down the sending in order to maximise throughput and minimise loss.

So this is entirely normal behaviour for UDP actually. UDP doesn't guarantee delivery if the receive queue is overloaded and will drop packets. If you want higher transmit rates for UDP you need to increase your receive buffer.

The -l or --len iperf option should do the trick. And possibly the target bandwidth setting -b on the client.

-l, --len n[KM] set length read/write buffer to n (default 8 KB)

8KB?? that's a little on the small side when there is no congestion control.

e.g. on the server side.

~$ iperf -l 1M -U -s

This is what I get Linux to Linux

Client connecting to ostore, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.107 port 35399 connected with 192.168.0.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes   943 Mbits/sec

But for UDP using the default settings I get only

~$ iperf -u -c ostore 
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.107 port 52898 connected with 192.168.0.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.25 MBytes  1.05 Mbits/sec
[  3] Sent 893 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  1.25 MBytes  1.05 Mbits/sec   0.027 ms    0/  893 (0%)

WT?

After some experimentation I found I had to set both the length, and the bandwidth target.

~$ iperf -u -c ostore -l 8192 -b 1G
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 8192 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.107 port 60237 connected with 192.168.0.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.12 GBytes   958 Mbits/sec
[  3] Sent 146243 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.

Server side:

~$ iperf -s -u -l 5M 
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 5242880 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 36448
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0-10.1 sec  1008 KBytes   819 Kbits/sec   0.018 ms    0/  126 (0%)
[  4] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 60237
[  4]  0.0-10.0 sec  1.12 GBytes   958 Mbits/sec   0.078 ms    0/146242 (0%)
[  4]  0.0-10.0 sec  1 datagrams received out-of-order

To demonstrate packet loss with small buffers. Which to be honest isn't as extreme as I was expecting. Where is a reliable source for iperf3 I can test against between Linux/Windows?

~$ iperf -u -c ostore -l 1K -b 1G
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 1024 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.107 port 45061 connected with 192.168.0.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   674 MBytes   565 Mbits/sec
[  3] Sent 689777 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec   670 MBytes   562 Mbits/sec   0.013 ms 3936/689776 (0.57%)
[  3]  0.0-10.0 sec  1 datagrams received out-of-order

Server side:

~$ iperf -s -u -l 1K 
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1024 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 45061
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0-10.0 sec   670 MBytes   562 Mbits/sec   0.013 ms 3936/689776 (0.57%)
[  3]  0.0-10.0 sec  1 datagrams received out-of-order

Have you also looked at the iperf3 github page readme?

Known Issues

UDP performance: Some problems have been noticed with iperf3 on the ESnet 100G testbed at high UDP rates (above 10Gbps). The symptom is that on any particular run of iperf3 the receiver reports a loss rate of about 20%, regardless of the -b option used on the client side. This problem appears not to be iperf3-specific, and may be due to the placement of the iperf3 process on a CPU and its relation to the inbound NIC. In some cases this problem can be mitigated by an appropriate use of the CPU affinity (-A) option. (Issue #55)

You're using a slower NIC but I wonder if it's related.