Debian LACP Bond eth0 Churning state

I have setup a LACP Bond on 2 x 1Gbps connections on a HP server running Debian 8.x, previously done this configuration on CentOS 7.x with no issues at all.

The issue I am facing is eth0 a minute after the OS booting goes into a churned state, once the "monitoring" stage has completed.

Actor Churn State: churned
Partner Churn State: churned

I have done reading online and can't seem to find much about what can cause this, I have had the DC check the switch configuration and is identical to a working CentOS setup.

I have attached the network configuration file below, the connection works however only uses eth1, so removes the benefits of a bond.

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 70:10:6f:51:88:8c
Active Aggregator Info:
    Aggregator ID: 2
    Number of ports: 1
    Actor Key: 9
    Partner Key: 14
    Partner Mac Address: 54:4b:8c:c9:51:c0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 70:10:6f:51:88:8c
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: 70:10:6f:51:88:8c
port key: 9
port priority: 255
port number: 1
port state: 71
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 70:10:6f:51:88:8d
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 70:10:6f:51:88:8c
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 127
system mac address: 54:4b:8c:c9:51:c0
oper key: 14
port priority: 127
port number: 29
port state: 63

Network Interfaces

auto eth0
iface eth0 inet manual
bond-master bond0

auto eth1
iface eth1 inet manual
bond-master bond0

auto bond0
iface bond0 inet manual
    bond_miimon 100
    bond_mode 802.3ad
    bond-downdelay 200
    bond-updelay 200
    bond-slaves none

auto vlan520
iface vlan520 inet static
    address  62.xxx.xxx.40
    netmask  255.255.255.0
    gateway  62.xxxx.xxxx.1
    vlan-raw-device bond0

auto vlan4001
iface vlan4001 inet static
    address  172.16.1.1
    netmask  255.255.255.0
    vlan-raw-device bond0

/etc/modprobe.d/bonding.conf

alias bond0 bonding
    options bonding mode=4 miimon=100 lacp_rate=1

Any help will be appreciated.

Thanks, Ash


Solution 1:

Please refer to the following article: https://access.redhat.com/solutions/4122011

The short answer is that it's related to the last kernel update. They suspect the following commit to be related to the LACP issue: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ea53abfab960909d622ca37bcfb8e1c5378d21cc

Until the solution will become available it makes sense booting to the older kernel. The issue stated happening as per the following version on the redhat based OSs:

kernel-3.10.0-957.1.3.el7

I will try to keep this post up-to-date as it looks like the last kernel update affected quite a bit of users.

Additional Reference:

https://patchwork.ozlabs.org/patch/437496/