How to find out the reason(s) why the network interface is dropping packets?
Is there a way on Linux to get statistics about the various reasons packets were dropped?
On all network interfaces (openSUSE 12.3) on several servers, ifconfig
and netstat -i
are reporting dropped packets at the reception. When I do a tcpdump
, the number of dropped packets stop increasing, meaning that the interfaces queues are not full and dropping the data. So there must be other reasons why this is happening (e.g. multicast pkts received whereas the interface is not part of this multicast group).
Where can I find such information? (/proc? /sys? some logs?)
Example of statistics (merge of the /sys/class/net/<dev>/statistics and ethtool output):
alloc_rx_buff_failed: 0
collisions: 0
dropped_smbus: 0
multicast: 1644
rx_align_errors: 0
rx_broadcast: 23626
rx_bytes: 1897203
rx_compressed: 0
rx_crc_errors: 0
rx_csum_offload_errors: 0
rx_csum_offload_good: 0
rx_dropped: 4738
rx_errors: 0
rx_fifo_errors: 0
rx_flow_control_xoff: 0
rx_flow_control_xon: 0
rx_frame_errors: 0
rx_length_errors: 0
rx_long_byte_count: 1998731
rx_long_length_errors: 0
rx_missed_errors: 0
rx_multicast: 1644
rx_no_buffer_count: 0
rx_over_errors: 0
rx_packets: 25382
rx_short_length_errors: 0
rx_smbus: 0
tx_aborted_errors: 0
tx_abort_late_coll: 0
tx_broadcast: 7
tx_bytes: 11300
tx_carrier_errors: 0
tx_compressed: 0
tx_deferred_ok: 0
tx_dropped: 0
tx_errors: 0
tx_fifo_errors: 0
tx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_heartbeat_errors: 0
tx_multicast: 43
tx_multi_coll_ok: 0
tx_packets: 63
tx_restart_queue: 0
tx_single_coll_ok: 0
tx_smbus: 0
tx_tcp_seg_failed: 0
tx_tcp_seg_good: 0
tx_timeout_count: 0
tx_window_errors: 0
Try /sys/class/net/eth0/statistics/
(i.e. for eth0
), it's not perfect but it breaks down errors by transmit/receive and by carrier, window, fifo, crc, frame, length (and a few more) types of errors.
Drops are not the same as "ignored", netstat
show interface level statistics, a multicast packet ignored by a higher level (layer 3, the IP stack) won't show as a drop (though it might show up as "filtered" on some NIC stats). Statistics may be complicated somewhat by various offload features.
You can get more stats if you have ethtool
:
# ethtool -S eth0
rx_packets: 60666755
tx_packets: 2206194
rx_bytes: 6630349870
tx_bytes: 815877983
rx_broadcast: 58230114
tx_broadcast: 9307
rx_multicast: 8406
tx_multicast: 17
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 8406
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
[...]
Some statistics depend on the NIC driver, as will the exact meaning. The above is from an Intel e1000
. Having looked at handful of drivers, some collect many more statistics than others (the stats available to ethtool tend to be kept in separate source file, e.g. drivers/net/ethernet/intel/e1000/e1000_ethtool.c
, if you need to rummage).
ethtool -i eth0
will show the driver details, the output of lspci -v
should be more detailed, though with a bit of clutter too.
Update
In tg3.c
function tg3_rx()
there's only one place that looks likely with a tp->rx_dropped++
, but the code is littered with goto
s, so there are several other causes than the obvious, i.e. anything with goto drop_it
or goto drop_it_no_recycle
.
(Note that the drop counter is one of the few maintained by the driver, the rest are maintained by the device itself.)
The driver source I have to hand is 3.123. My best guess is this code:
if (len > (tp->dev->mtu + ETH_HLEN) &&
skb->protocol != htons(ETH_P_8021Q)) {
dev_kfree_skb(skb);
goto drop_it_no_recycle;
}
Check the MTU, possible causes are jumbo frames, or slightly oversized ethernet frames to allow for encapsulation. I cannot explain why tcpdump
might change the behaviour, it's not known to change the interface MTU. Note also that you may "see" packets larger then the MTU with tcpdump
if TSO/LRO is enabled (explanation).