Packet loss rate with iperf and tcpdump
I've experienced significant dataloss with iPerf in UDP mode as a result of the CPU not being able to keep up. For some reason, iPerf with UDP seems to be much more CPU intensive than iPerf with TCP. Do you experience the same loss percentages when you set iPerf to half the rate?
To answer your second question about how much packet loss is acceptable, it really depends on what application you are running, how much traffic you've got. Really, there shouldn't be any loss if you are under your bandwidth limit. For most things, I probably wouldn't complain too much about .25%, but that is still a lot of loss if you are running at really high rates.
[EDIT 1] Some other thoughts that I've had on the topic:
- Try incrementing the rates of iPerf. If there is a systemic problem somewhere, it is likely that you'll experience the same percentage of loss no matter what the rate. If you are at the limits of your hardware, or your provider does some sort of RED, then there will likely be no loss up to a certain rate, and then incrementally worse loss the higher above that you go.
- Do your tcpdump measurement of the iPerf session, just to verify that your tests are accurate.
- Try iPerf with TCP. This won't report loss, but if you are getting loss then the connection won't be able to scale up very high. Since latency will also affect this, make sure to test to an endpoint with as little latency as possible.
- Depending on what gear you have on the inside of your connection, make sure you are as close it it as possible. E.g. if you have multiple switches between your test system and the edge router, move to a directly connected switch.
- If you have a managed switch, check the stats on it to make sure the loss isn't occurring there. I've encountered some cheaper switches that start dropping when you get close to 100Mbps of UDP traffic on them (mostly old and cheap unmanaged switches though).
- Try simultaneous iPerfs from two different clients to two different hosts, so that you can be sure the limit isn't a result of CPU or a cheap local NIC card.