How does one diagnose Linux LACP issues at the kernel level?

Solution 1:

The bonding driver doesn't expose any LACP state machine debugging to userspace, you'd need to know the code and use kernel instrumentation like SystemTap, or write your own debugging into your own bonding module and compile it for your kernel.

However, the problem is that the bonding driver thinks the slave is down:

MII Status: down

You say you're confident the slave has link, so we'll ignore a physical problem.

Either the bond/slave isn't configured properly and the slave is administratively down, or the driver in use doesn't support netif_carrier() style link detection inside the kernel and you need to set use_carrier=0 in the options of your bonding module.

Solution 2:

Try set next LACP properties on linux side to:

bond_downdelay 0
bond_updelay 0
bond_xmit_hash_policy layer3+4
bond_lacp_rate fast

On Cisco side, recreate the port-channel and enable fast rate of LACP:

port-channel load-balance src-dst-ip
interface GigabitEthernet1/0/15
    lacp rate fast
exit

If Cisco switch can't set lacp rate fast, then you need to update its IOS.

Cisco works with LACP worse than Linux. Set port-channel load-balance src-dst-port if your Cisco switch can.