Why Packet Loss AFTER tcpdump has logged the packet?
we found the root cause of this. We had an acceptCount of 25 in our tomcat server.xml.
acceptCount is documented like this:
acceptCount
The maximum queue length for incoming connection requests when all possible request processing threads are in use. Any requests received when the queue is full will be refused. The default value is 100.
But this is not the whole story about acceptCount. Short: acceptCount is the backlog Parameter when opening the socket. So this value is important for the listen backlog, even if not all threads are busy. It is important if request are faster coming in then tomcat can accept and delegate them to waiting threads. The default acceptCount is 100. This is still a small value to feed a sudden peak in requests.
We checked the same thing with apache and nginx and had the same strange packet loss but with higher concurrency values. The corresponding value in apache is ListenBacklog which defaults to 511.
BUT, with debian (and other linux based os) the default max value for the backlog paramter is 128.
$ sysctl -a | grep somaxc
net.core.somaxconn = 128
So whatever you type in acceptCount or ListenBacklog it will not be over 128 until you change net.core.somaxconn
For a very busy webserver 128 is not enough. You should change it to something like 500, 1000 or 3000, depending on your needs.
After setting acceptCount to 1000 and net.core.somaxconn to 1000 we no longer had those dropped packets. (Now we have a bottleneck somewhere else, but this is another story..)