Link aggregation (LACP/802.3ad) max throughput

I'm seeing some confusing behaviour regarding bonded interfaces under Linux and I'd like to throw the situation out there in hopes that someone can clear it up for me.

I have two servers: Server 1 (S1) has 4x 1Gbit ethernet connections; Server 2 (S2) has 2x 1Gbit ethernet connections. Both servers are running Ubuntu 12.04, albeit with kernel 3.11.0-15 (from the lts-saucy linux-generic package).

Both servers have all their respective network interfaces bundled into a single bond0 interface with the following configuration (in /etc/network/interfaces):

bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate fast
bond-slaves eth0 eth1 [eth2 eth3]

Between the servers are a couple of HP switches which are (I think) correctly configured for LACP on the ports in question.

Now, the link is working - network traffic flows happily to and from both machines. And all respective interfaces are being used, so it's not like the aggregation is completely failing. However, I need as much bandwidth as possible between these two servers, and I'm not getting the ~2Gbit/s that I would expect.

In my testing, I can observe that each server seems to allocate each TCP connection (e.g. iperf, scp, nfs, whatever) to a single slave interface. Essentially everything seems capped at a max of 1 gigabit.

By setting bond-xmit-hash-policy layer3+4, I can use iperf -c S1 -P2 to send on two slave interfaces, but on the server side, reception is still only occurring on one slave interface and the total throughput is therefore capped at 1Gbit/s, i.e. the client shows ~40-50MB/s on two slave interfaces, the server shows ~100MB/s on one slave interface. Without setting bond-xmit-hash-policy the sending is also limited to one slave interface.

I was under the impression that LACP should allow this kind of connection bundling, allowing, for example, a single scp transfer to make use of all available interfaces between the two hosts.

Is my understanding of LACP wrong? Or have I missed some configuration options somewhere? Any suggestions or clues for investigation would be much appreciated!


A quick and dirty explanation is that a single line of communication using LACP will not split packets over multiple interfaces. For example, if you have a single TCP connection streaming packets from HostA to HostB it will not span interfaces to send those packets. I've been looking at LACP a lot here lately for a solution we are working on and this is a common misconception that 'bonding' or 'trunking' multiple network interfaces with LACP gives you a "throughput" of the combined interfaces. Some vendors have made proprietary drivers that will route over multiple interfaces but the LACP standard does not from what I've read. Here's a link to a decent diagram and explanation I found from HP while searching on similar issues: http://www.hp.com/rnd/library/pdf/59692372.pdf


bond-xmit-hash-policy layer3+4 sets the load balancing from your source server to the switch. It doesn't set the load balancing algorithm from your switch to the second server. That is almost certainly still layer-2 or layer-3 balanced, i.e. not at all.