Debian LACP Bond eth0 Churning state
I have setup a LACP Bond on 2 x 1Gbps connections on a HP server running Debian 8.x, previously done this configuration on CentOS 7.x with no issues at all.
The issue I am facing is eth0 a minute after the OS booting goes into a churned state, once the "monitoring" stage has completed.
Actor Churn State: churned
Partner Churn State: churned
I have done reading online and can't seem to find much about what can cause this, I have had the DC check the switch configuration and is identical to a working CentOS setup.
I have attached the network configuration file below, the connection works however only uses eth1, so removes the benefits of a bond.
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 70:10:6f:51:88:8c
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 9
Partner Key: 14
Partner Mac Address: 54:4b:8c:c9:51:c0
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 70:10:6f:51:88:8c
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: 70:10:6f:51:88:8c
port key: 9
port priority: 255
port number: 1
port state: 71
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 70:10:6f:51:88:8d
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 70:10:6f:51:88:8c
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 127
system mac address: 54:4b:8c:c9:51:c0
oper key: 14
port priority: 127
port number: 29
port state: 63
Network Interfaces
auto eth0
iface eth0 inet manual
bond-master bond0
auto eth1
iface eth1 inet manual
bond-master bond0
auto bond0
iface bond0 inet manual
bond_miimon 100
bond_mode 802.3ad
bond-downdelay 200
bond-updelay 200
bond-slaves none
auto vlan520
iface vlan520 inet static
address 62.xxx.xxx.40
netmask 255.255.255.0
gateway 62.xxxx.xxxx.1
vlan-raw-device bond0
auto vlan4001
iface vlan4001 inet static
address 172.16.1.1
netmask 255.255.255.0
vlan-raw-device bond0
/etc/modprobe.d/bonding.conf
alias bond0 bonding
options bonding mode=4 miimon=100 lacp_rate=1
Any help will be appreciated.
Thanks, Ash
Solution 1:
Please refer to the following article: https://access.redhat.com/solutions/4122011
The short answer is that it's related to the last kernel update. They suspect the following commit to be related to the LACP issue: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ea53abfab960909d622ca37bcfb8e1c5378d21cc
Until the solution will become available it makes sense booting to the older kernel. The issue stated happening as per the following version on the redhat based OSs:
kernel-3.10.0-957.1.3.el7
I will try to keep this post up-to-date as it looks like the last kernel update affected quite a bit of users.
Additional Reference:
https://patchwork.ozlabs.org/patch/437496/