1.73 Gbps at best on an Amazon EC2 10 Gigabit instance?
Solution 1:
AWS Support admit that 10 GbE speeds can only be achieved between instances on the private subnet network. It requires that the private IP is used as opposed to the public IP which in my case always maxes out at 1.73 Gbps. That might change depending on zone and region. If you see different results please post them here.
This means that when it comes to external throughput, the c3.8xlarge (or similar 10 GbE instances) offer terrible value when compared to smaller instances with "High" network capabilities. A c1.medium instance comes at 1/16 the price of a c3.8xlarge, but it will allow for over half the througput (~0,95 Gbps) of a c3.8xlarge 10 GbE instance (~1,7 Gbps).
See this thread on the Wowza forums for AWS Support's answers.
Solution 2:
Because of the virtualization layer the networking layer can't use DMA directly and CPU has to copy data back and forth spending time doing softirq. In this case, when you have too many packets transferred you need to tell the kernel to use more than one CPU core for that.
You can monitor this by doing watch -n1 cat /proc/softirqs
and looking at NET_RX.
Fortunately there is a feature called packet steering which allow us to use more CPU cores for receiving and transiting packets.
To allow the CPU to use more than one core for receiving packets you can do echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
For transiting you can do echo f0 > /sys/class/net/eth0/queues/tx-0/xps_cpus
This way the first 4 cores would be used for receiving and the next for 4 for transmitting.
f => 1+2+4+8 = 15 in hexadecimal
f0 => 16+32+64+128 = 240 in hexadecimal
Solution 3:
Hope this helps you, we've wondered EC2's true public facing throughput for a while. We just finished running several Wowza Edge instances on C4.8xl instances and had no issues at 6+Gbps per instance. Per http://www.aerospike.com/blog/boosting-amazon-ec2-network-for-high-throughput/, the benchmarks below seem to be very accurate:
*Network Bandwidth Amazon offers a range of instance types with varying amounts of memory and CPU. What is not well “documented” however, is network capabilities which are simply categorized as – Low, Moderate, High, and 10Gb. Based on our experiments running Aerospike servers on AWS and iperf runs on AWS, we were able to better define these categories to the following numbers:
- Low – Up to 100 Mbps
- Moderate – 100 Mbps to 300 Mbps
- High – 100 Mbps to 1.86 Gbps
- 10Gb – upto 8.86Gbps*
Solution 4:
I am not sure how you are running iperf for your tests but sometimes it needs to be run multi-threaded to yield results that better reflect the actual maximum throughput of the underlying network stack. I have seen it necessary to sometimes build the thread count up to 96 to get to what appeared to be close to the optimal throughput.