Do you trust LACP?
Are there any reasons why I should not rely on LACP when designing network topology? I exactly mean L2 switch to hypervisor connection, so it is the place where agregated traffic of VMs cumulates. We are talking about 5 x 1 GbE LACP bonding.
I am in disagreement with my colleague. He says: "Why we should add another layer of overhead to entire setup? It is just another potential point of failure." And he is overall sceptic about link aggregation. I have an opinion that linux bonding driver in 802.3ad mode is reliable and good choice.
He also thinks that we dont need it, because there won´t ever be such a big traffic in our environment, that simple 1 GbE will be enough. We are high school with about 100 PC clients and about 10 servers in our LAN.
So we are in situation when we exactly don´t know weather we need LACP or not. Some additional data about network traffic would be fine, but I believe it is challenging to retrieve meaningful numbers. So it is finally easier to rely on intuition and just say: "Yes, we want LACP, to be sure, because of traffic." or "No, because it is not reliable and we don´t need it."
Any suggestions?
Solution 1:
To tell the true, LACP was born exactly to solve a dangerous problem itself caused by LAG (Link aggregation Group).
When used between directly attached interface, LAG is not dangerous. In such a setup, basically any network problem can be tracked back to a port with no link - which automatically instruct the switch to stop sending traffic to the disconnected port.
However, if some other device sits between the LAG-enabled switch and the aggregated Gbit ports, some other logical issues can arise, causing real problems because the forwarding switch has no information about these transient problems (it will continue to blindly send traffic to the disconnected/problematic ports).
In order to solve this problem LACP was defined: it uses an heartbeat-based system to constantly monitor the aggregated port, and automatically disconnect them when too much heartbeats are lost.
In short: if correctly configured, I see no problems in using LACP. The only thing to consider is that you inevitably have a slight more complex configuration to track/manage.
Solution 2:
Yes, I trust LACP. I prefer LACP over all other link aggregation methods because it's so reliable, flexible, and is an IEEE standard so vendor interop is guaranteed.
If you think your virtual machines will do more than 1 gigabit per second of traffic (and that's very easy to do) then you want to load balance. The only load balancing modes (on Linux) which work for you are either Mode 2 (balance-xor) or Mode 4 (LACP). Mode 2 uses the same balancing as Mode 4, just without the constant heartbeat to the switch.