TCP acks are paused, then resumed, then paused again. Why?
I would like some help finding the reason for the reduced data transfer rate in my application.
I have 12 embedded systems and a Linux server. The embedded systems send data to the server over TCP on an Ethernet link through a switch. The following is a TCP StreamGraph made from a Wireshark capture of the traffic from one board.
As you can see, the data transfer happens at around 5.8MBit/s up to around 0.25 seconds. This is as fast as I can expect the embedded system to go. After this, delays are inserted in the transfer. The following shows a closeup of the graph:
The staircase shaped curve at the bottom labeled ACK shows how much data has been ACKed by the server at any given time. The corresponding curve labeled RWIN shows how much there would be space for in the buffers on the datapc. The smaller vertical segments labeled SENT DATA are the actual packets sent.
At point A, the server acks the data as fast as it is sent, but then for a duration of 23ms no acks are sent by the server. The embedded system is allowed to send up to RWIN without waiting for an ACK, but it does not do so because it needs to keep the sent data around until they are acked (in case they need to be retransmitted) and the send buffer space is limited.
Then, at point B, all received data is ACKed at once and normal acking and sending resumes for 2.5ms before another pause happens.
The Wireshark capture was made from a different PC which was connected to a port on the switch that was set up to mirror all data sent and received on the port to which the embedded system was connected.
The Linux server runs a Java application which processes the data and stores them on disk. It shows no signs of having maxed out the CPU. The operating system is Ubuntu Server 12.04 with default network settings.
I can see that I could probably benefit from allocating more send buffer space in the embedded system to match the amount of receive window space in the Linux server, but this does not seem to be the limiting factor here.
My questions are:
- What could be the reason for the Linux server pausing the ACKs even though it is obviously able to receive everything just fine?
- How can I go about debugging this?
Solution 1:
Try to turn off Ethernet PAUSE frames with ethtool -A devname autoneg off rx off tx off
If that don't help, it can be a TCP windows scaling problem and/or an IRQ storming issue on the sending or receiver host. You can investigate both problems trying different settings with ethtool
and sysctl entries regulating TCP traffic.
Without other informations, it is quite difficult to tell what's happening here...