ARP table emptied when TAP interface is added to bridge
I have a problem with ARP table on CentOS. Sometimes, when I create a TAP interface and add it to a bridge, the ARP table is cleared up.
E.g. when I execute this command:
sudo ip tuntap add dev tap-device-u98 mode tap; sudo ip link set dev tap-device-u98 master br0
the ARP table drops to just several entries.
It happens too when I do:
sudo ip link set dev tap-device-u98 nomaster
Most of the entries in the ARP table are permanent and managed by a home-made dedicated application (for our specific needs). They can hold up to 12k entries, but the amount is not relevant (it happens as well with large table (12k) as with small ones (10)).
This happens on most of our servers, from CentOS6.2 to CentOS7.8.
The exact issue occurs in production isn't with the above commands, but with openvpn (to create the TAP interface) and brctl (to add the interface to the bridge). But it seems to be irrelevant, as the issue also happens with 'ip'. The PoC with 'ip' commands was executed on a CentOS 7.4.1708 with kernel 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017
/var/log/messages doesn't help very much:
--- (Create interface and mount it on br0)
Nov 26 15:52:51 localhost NetworkManager[733]: <info> [1606402371.7604] manager: (tap-device-u98): new Tun device (/org/freedesktop/NetworkManager/Devices/2723)
Nov 26 15:52:51 localhost kernel: br0: port 7(tap-device-u98) entered blocking state
Nov 26 15:52:51 localhost kernel: br0: port 7(tap-device-u98) entered disabled state
Nov 26 15:52:51 localhost kernel: device tap-device-u98 entered promiscuous mode
--- (Remove interface from br0 and delete it)
Nov 26 15:52:51 localhost kernel: device tap-device-u98 left promiscuous mode
Nov 26 15:52:51 localhost kernel: br0: port 7(tap-device-u98) entered disabled state
Nov 26 15:52:51 localhost NetworkManager[733]: <info> [1606402371.8909] device (tap-device-u98): released from master device br0
Any ideas of what could go wrong?
Solution 1:
For the record, I'm posting the answer here.
Because of the way the MAC address of the bridge is calculated, adding or removing interfaces to/from it can trigger the recalculation of the MAC address (function br_stp_recalculate_bridge_id from br_stp_if.c).
So, because the bridge didn't had a fixed MAC address, it might be regenerated, which lead to an ARP drop.
To fix the issue, simply fix the MAC address, then:
sudo vim /etc/sysconfig/network-scripts/ifcfg-br0
[...]
MACADDR=xx:xx:xx:xx:xx:xx
sudo ifdown br0
sudo ifup br0