lots of packets pruned and packets collapsed because of socket buffer low/overrun
i've set up a test machine (debian squeeze 2.6.32 on a linode 2048 machine) that interact with an api that returns large chunks of json. It calls the API 3000/minutes asynchronously, the api is returning payloads of ~450kb. There's also an http server on the box to display the calls results.
While doing netstat -s (uptime is 20 days):
254329 packets pruned from receive queue because of socket buffer overrun
50678438 packets collapsed in receive queue due to low socket buffer
This didn't sound good to me so I've followed these tutorials to tweak the TCP parameters:
http://fasterdata.es.net/fasterdata/host-tuning/linux/test-measurement-host-tuning/
and
http://www.acc.umu.se/~maswan/linux-netperf.txt
but it doesn't seems to help.
Any advice/tutorial/explaination about socket buffers that might help understanding and fixing the problem?
thanks
Solution 1:
It sounds like you are reaching the maximum network traffic your VPS can handle. Tweaking TCP parameters isn't magic - it can help a little, but probably not enough. Some tweaks may even be negated by running in a virtual machine - the traffic still gets passed through the hypervisor's real network card and is affected by it's settings.
You say the incoming payload is 450kb per request. Is that in kilo bits or kilo bytes? Most tools measure the size in bytes, but I'll do both calculations.
Assuming kilobits:
- 3000 requests/minute = 50 requests/second
- 50*450kbit = 22,500kbit/s = approx 22Mbit/s
Assuming kilobytes, it's approx 176Mbit/s.
If it's kilobytes, you aren't going to be able to consistently do that on most VPS servers. Each server is going to have at least 10-20 VPSs on it. Linode uses two gigabit bonded connections to each server. That means your "fair share" on a full servers would be around 100Mbit/s at best.
Even if it is kilobits, 22Mbit is a fair bit for most VPSs.
By doing so many requests so fast, you are probably doing the equivalent of DOSing your own server. Checking your actual incoming network traffic should give you an idea of how much you are actually using. If you need real 100mbit or even gigabit speeds, you may need to look at a dedicated server. Otherwise, you need to slow down the requests until it slows down enough that the server can handle it.
You also need to check your memory and CPU usage. If either of those are maxed out, your server will start dropping packets because it simply doesn't have the resources to handle them. Start by looking at top and ntop to watch your CPU, memory and network usage for awhile.
Solution 2:
Socket buffer overrun means that data is not fit into special memory buffer, assigned to each connection. All the data coming from network interface is put into such a buffer, and your application is reading from it. Once the application has read the data - it's flushed from this buffer. Basically you should expect application to read data as soon as it's available and application is free to process the data. But if you have not enough performance - is it CPU saturated or application locked (which is quite often with nodejs) - the data is keep coming, but buffer size is not enough to handle it all.
Even if you have enourmous buffers - it's still be pruned and data discarded if you application cannot process everything in time. So I'd suggest you to tune the application performance first.