Is bonding mode=5 a solution against MAC flapping?

In mode 5, or balance-tlb mode, outgoing traffic uses the MAC address of the slave interface that it's leaving, instead of using the address of the bond interface.

Typically, the bond's MAC is used for all traffic, which can cause a MAC flapping condition between two ports on a given switch - each of your switches will see ingressing traffic with the bond's MAC as the source, both from the direct connection to the device, and from the cross-connect to the other switch.

The transmit load-balancing mode skirts this issue by balancing traffic outbound between interfaces, but by using the interface's MAC address as the source for outbound traffic. If your other nodes in the subnet (particularly the router) don't mind this behavior, then it works just fine - typically there will be no issue, but I can imagine some restrictive router security settings taking offense.

The bond interface will take the MAC address of one of its slave interfaces:

root@test1:~# ifconfig
bond1     Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:35
          inet addr:192.168.100.25  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe3d:f735/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

eth1      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:35
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:3f
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

eth1's MAC matches the bond interface, it's the "primary", so it's getting the inbound traffic.

And, just to confirm:

root@test1:~# cat /sys/class/net/bond1/bonding/mode
balance-tlb 5

root@test1:~# cat /sys/class/net/bond1/bonding/active_slave
eth1

Ok, so.. is it load balancing? Here's how it looks from another node, sending constant pings:

root@test2:~# tcpdump -e -n -i eth0 proto 1
20:33:08.094078 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 38, length 64
20:33:08.094549 00:0c:29:3d:f7:35 > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 38, length 64
20:33:09.094052 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 39, length 64
20:33:09.094520 00:0c:29:3d:f7:35 > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 39, length 64
20:33:10.094078 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 40, length 64
20:33:10.094540 00:0c:29:3d:f7:35 > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 40, length 64

That all looks normal - eth1 is responding. Then, unprompted, there's a switch - notice that the request's destination MAC and the response's source MAC no longer match.

20:33:11.094084 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 41, length 64
20:33:11.094614 00:0c:29:3d:f7:3f > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 41, length 64
20:33:12.094059 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 42, length 64
20:33:12.094531 00:0c:29:3d:f7:3f > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 42, length 64
20:33:13.094086 00:0c:29:46:4f:c6 > 00:0c:29:3d:f7:35, ethertype IPv4 (0x0800), length 98: 192.168.100.40 > 192.168.100.25: ICMP echo request, id 5810, seq 43, length 64
20:33:13.094581 00:0c:29:3d:f7:3f > 00:0c:29:46:4f:c6, ethertype IPv4 (0x0800), length 98: 192.168.100.25 > 192.168.100.40: ICMP echo reply, id 5810, seq 43, length 64

Watching a constant ping, the switches between source continue arbitrarily based on the bond interface's evaluation of the load - it seems to re-evaluate every 10 seconds.


Failover for inbound traffic in mode 5 is much more basic; when the interface is detected as down, the bond interface's MAC is simply moved over to the live interface. This'll often fire a MAC flapping warning in your switch logs, but that's to be expected; nothing to worry about.

The interface MACs change from this:

eth1      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:35
eth2      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:3f

..to, after taking eth1 down, this:

eth1      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:3f
eth2      Link encap:Ethernet  HWaddr 00:0c:29:3d:f7:35

And, all traffic sources from eth2, with a MAC of :35.


So, yeah - assuming that you don't care about load balancing of inbound traffic, the balance-tlb mode seems to do an excellent job of switch-safe load balancing of outbound traffic and failover of inbound traffic.

If your router doesn't care about multiple source MACs sending traffic for a single IP, and doesn't get offended by gratuitous ARP failovers, then you should be good to go!