packet queue performance discrepencies with BIND nameserver

Wondering if the box is a Dell? There's a well known issue with the bnx2i driver and chipsets shipped by Dell. The result is randomly dropped packets under heavy network load. Would seem logical that the tuned-up ring buffers could trigger it, if this is the case.

I believe Dell offers their own version of the driver as a fix. The other fix is to do something like this in modprobe.conf:

options bnx2i disable_msi=1

Can't hurt to try, anyhow. And x2 what kce said. One of the best written questions I've ever seen here.


Even if you're sure that you have a full list of load balancer VIPs for your servers, run a packet capture anyway. Just because your machine won't respond to ARP for an IP address doesn't mean that bogus packets can't be sent to it. Make sure the traffic being sent to your MAC addresses are matching up with configured IP addresses.

I appreciate the time that people put into this question, but my own due diligence was lacking here. In hindsight, I needed to build a PCAP filter like this:

tcpdump -i eth0 -n 'ether dst aa:bb:cc:dd:ee:ff and not (dst host 1.2.3.4 or dst host 5.6.7.8 or...)'

Where:

aa:bb:cc:dd:ee:ff = HW addr of eth0
1.2.3.4, 5.6.7.8  = list of destination addresses that traffic is expected on

There were a number of load balancer VIPs that were not given to me (I don't control the LB), and they were passing traffic on TCP port 53 in ways that would result in RX discards. The volume of traffic on these legacy IPs was so low that it was not likely to be noticed by an admin eyeballing traffic on the wire.