Extreme UDP packet loss at 300Mbit (14%), but TCP > 800Mbit w/o retransmits
I have a linux box I use as the iperf3
client, testing 2 identically equipped Windows 2012 R2 server boxes with Broadcom BCM5721, 1Gb adapters (2 ports, but only 1 used for the test). All machines are connected via a single 1Gb switch.
Testing UDP at e.g. 300Mbit
iperf3 -uZVc 192.168.30.161 -b300m -t5 --get-server-output -l8192
results in the loss of 14% of all packets sent (for the other server box with exact same hardware, but older NIC drivers, loss is around 2%), but loss occurs even at 50Mbit, albeit less severely. TCP performance using equivalent settings:
iperf3 -ZVc 192.168.30.161 -t5 --get-server-output -l8192
yields transmission speeds north of 800Mbit, with no reported retransmissions.
The server is always started up using the following options:
iperf3 -sB192.168.30.161
Who's to blame?
-
The linux client box (hardware? drivers? settings?)?Edit: I just ran the test from one Windows server box to the other and the UDP packet loss at 300Mbit was even higher, at 22% - The windows server boxes (hardware? driver? settings?)?
- The (single) switch that connects all test machines?
- Cables?
Edit:
Now I tried the other direction: Windows -> Linux. Result: Packet loss always 0, while throughput maxes out at around
- 840Mbit for
-l8192
, i.e. fragmented IP packets - 250Mbit for
-l1472
, unfragmented IP packets
I guess flow control caps throughput, and prevents packet loss. Especially the latter, unfragmented figure is nowhere near TCP throughput (unfragmented TCP yields similar figures to fragmented TCP), but it's an infinitely huge improvement over Linux -> Windows in terms of packet loss.
And how to find out?
I did follow the usual advice for driver settings on the server to maximize performance and tried to enable/disable/maximize/minimize/change
- Interrupt Moderation
- Flow Control
- Receive Buffers
- RSS
- Wake-on-LAN
All offload features are enabled.
Edit I also tried to enable/disable
- Ethernet@Wirespeed
- The various offload features
- Priority&VLAN
With similar loss rates.
The full output of a UDP run:
$ iperf3 -uZVc 192.168.30.161 -b300m -t5 --get-server-output -l8192
iperf 3.0.7
Linux mybox 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux
Time: Wed, 13 May 2015 13:10:39 GMT
Connecting to host 192.168.30.161, port 5201
Cookie: mybox.1431522639.098587.3451f174
[ 4] local 192.168.30.202 port 50851 connected to 192.168.30.161 port 5201
Starting Test: protocol: UDP, 1 streams, 8192 byte blocks, omitting 0 seconds, 5 second test
[ ID] Interval Transfer Bandwidth Total Datagrams
[ 4] 0.00-1.00 sec 33.3 MBytes 279 Mbits/sec 4262
[ 4] 1.00-2.00 sec 35.8 MBytes 300 Mbits/sec 4577
[ 4] 2.00-3.00 sec 35.8 MBytes 300 Mbits/sec 4578
[ 4] 3.00-4.00 sec 35.8 MBytes 300 Mbits/sec 4578
[ 4] 4.00-5.00 sec 35.8 MBytes 300 Mbits/sec 4577
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-5.00 sec 176 MBytes 296 Mbits/sec 0.053 ms 3216/22571 (14%)
[ 4] Sent 22571 datagrams
CPU Utilization: local/sender 4.7% (0.4%u/4.3%s), remote/receiver 1.7% (0.8%u/0.9%s)
Server output:
-----------------------------------------------------------
Accepted connection from 192.168.30.202, port 44770
[ 5] local 192.168.30.161 port 5201 connected to 192.168.30.202 port 50851
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-1.01 sec 27.2 MBytes 226 Mbits/sec 0.043 ms 781/4261 (18%)
[ 5] 1.01-2.01 sec 30.0 MBytes 252 Mbits/sec 0.058 ms 734/4577 (16%)
[ 5] 2.01-3.01 sec 29.0 MBytes 243 Mbits/sec 0.045 ms 870/4578 (19%)
[ 5] 3.01-4.01 sec 32.1 MBytes 269 Mbits/sec 0.037 ms 469/4579 (10%)
[ 5] 4.01-5.01 sec 32.9 MBytes 276 Mbits/sec 0.053 ms 362/4576 (7.9%)
TCP run:
$ iperf3 -ZVc 192.168.30.161 -t5 --get-server-output -l8192
iperf 3.0.7
Linux mybox 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux
Time: Wed, 13 May 2015 13:13:53 GMT
Connecting to host 192.168.30.161, port 5201
Cookie: mybox.1431522833.505583.4078fcc1
TCP MSS: 1448 (default)
[ 4] local 192.168.30.202 port 44782 connected to 192.168.30.161 port 5201
Starting Test: protocol: TCP, 1 streams, 8192 byte blocks, omitting 0 seconds, 5 second test
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 109 MBytes 910 Mbits/sec 0 91.9 KBytes
[ 4] 1.00-2.00 sec 97.3 MBytes 816 Mbits/sec 0 91.9 KBytes
[ 4] 2.00-3.00 sec 97.5 MBytes 818 Mbits/sec 0 91.9 KBytes
[ 4] 3.00-4.00 sec 98.0 MBytes 822 Mbits/sec 0 91.9 KBytes
[ 4] 4.00-5.00 sec 97.6 MBytes 819 Mbits/sec 0 91.9 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-5.00 sec 499 MBytes 837 Mbits/sec 0 sender
[ 4] 0.00-5.00 sec 498 MBytes 836 Mbits/sec receiver
CPU Utilization: local/sender 3.5% (0.5%u/3.0%s), remote/receiver 4.5% (2.0%u/2.5%s)
Server output:
-----------------------------------------------------------
Accepted connection from 192.168.30.202, port 44781
[ 5] local 192.168.30.161 port 5201 connected to 192.168.30.202 port 44782
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 105 MBytes 878 Mbits/sec
[ 5] 1.00-2.00 sec 97.5 MBytes 818 Mbits/sec
[ 5] 2.00-3.00 sec 97.6 MBytes 819 Mbits/sec
[ 5] 3.00-4.00 sec 97.8 MBytes 820 Mbits/sec
[ 5] 4.00-5.00 sec 97.7 MBytes 820 Mbits/sec
There is no problem. This is normal and expected behaviour.
The reason for the packet loss is that UDP doesn't have any congestion control. In tcp when congestion control algorithms kick in, it will tell the transmit end to slow down the sending in order to maximise throughput and minimise loss.
So this is entirely normal behaviour for UDP actually. UDP doesn't guarantee delivery if the receive queue is overloaded and will drop packets. If you want higher transmit rates for UDP you need to increase your receive buffer.
The -l or --len iperf option should do the trick. And possibly the target bandwidth setting -b on the client.
-l, --len n[KM] set length read/write buffer to n (default 8 KB)
8KB?? that's a little on the small side when there is no congestion control.
e.g. on the server side.
~$ iperf -l 1M -U -s
This is what I get Linux to Linux
Client connecting to ostore, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.107 port 35399 connected with 192.168.0.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec
But for UDP using the default settings I get only
~$ iperf -u -c ostore
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.107 port 52898 connected with 192.168.0.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec
[ 3] Sent 893 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.027 ms 0/ 893 (0%)
WT?
After some experimentation I found I had to set both the length, and the bandwidth target.
~$ iperf -u -c ostore -l 8192 -b 1G
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 8192 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.107 port 60237 connected with 192.168.0.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.12 GBytes 958 Mbits/sec
[ 3] Sent 146243 datagrams
[ 3] WARNING: did not receive ack of last datagram after 10 tries.
Server side:
~$ iperf -s -u -l 5M
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 5242880 byte datagrams
UDP buffer size: 224 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 36448
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-10.1 sec 1008 KBytes 819 Kbits/sec 0.018 ms 0/ 126 (0%)
[ 4] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 60237
[ 4] 0.0-10.0 sec 1.12 GBytes 958 Mbits/sec 0.078 ms 0/146242 (0%)
[ 4] 0.0-10.0 sec 1 datagrams received out-of-order
To demonstrate packet loss with small buffers. Which to be honest isn't as extreme as I was expecting. Where is a reliable source for iperf3 I can test against between Linux/Windows?
~$ iperf -u -c ostore -l 1K -b 1G
------------------------------------------------------------
Client connecting to ostore, UDP port 5001
Sending 1024 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.107 port 45061 connected with 192.168.0.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 674 MBytes 565 Mbits/sec
[ 3] Sent 689777 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 670 MBytes 562 Mbits/sec 0.013 ms 3936/689776 (0.57%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-order
Server side:
~$ iperf -s -u -l 1K
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1024 byte datagrams
UDP buffer size: 224 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.10 port 5001 connected with 192.168.0.107 port 45061
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-10.0 sec 670 MBytes 562 Mbits/sec 0.013 ms 3936/689776 (0.57%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-order
Have you also looked at the iperf3 github page readme?
Known Issues
UDP performance: Some problems have been noticed with iperf3 on the ESnet 100G testbed at high UDP rates (above 10Gbps). The symptom is that on any particular run of iperf3 the receiver reports a loss rate of about 20%, regardless of the -b option used on the client side. This problem appears not to be iperf3-specific, and may be due to the placement of the iperf3 process on a CPU and its relation to the inbound NIC. In some cases this problem can be mitigated by an appropriate use of the CPU affinity (-A) option. (Issue #55)
You're using a slower NIC but I wonder if it's related.