Network Manager: failover between two default gateways
Is Network Manager able to check if a default gateway can route packages to the internet?
I have two interfaces, both provide a route to the internet.
When I unplug a cable from any of them, default gateway is being updated and my internet connection works. But if the current preferred gateway fails without breaking a physical link, system does not fail over to second one.
I've tried to set a higher metric to a failing GW manually, it worked, but it is a manual step which I want to avoid.
Can this problem be solved using Network Manager?
My setup: Ubuntu 16.04, NM 1.2.2
UPD
Community member on NM's irc channel answered that NM does not check if a gateway actually works and does not perform any GW switching.
VRRP/ucarp/heartbeat/keepalived do not check it either. They only check a network availability, and switch the upstream GWs behind a virtual interface. This does not help in my case.
Iproute's nexthop kinda works, but with an enormous latency.
Routes are being cached by kernel and even after ip route flush cache
it took about 10 minutes for system to fail over to second GW.
ip route replace default scope global \
nexthop via 11.22.33.1 dev eth0 weight 1 \
nexthop via 55.66.77.1 dev eth1 weight 1
My current solution: a shell script which checks if current default gw provides internet access; if not - it increases a metric of current GW and system fail over to second one with a lower metric.
I'm still hoping to find a more elegant solution.
Solution 1:
This is what BGP was made for. Using what is commonly referred to as iBGP for internal router intercommunication and path redundancy and/or eBGP for Internet level full path redundancy. BGP describes a protocol for routers to communicate with one another the analytical data necessary to make judgment calls on the nature of valid and functional traffic paths within an autonomous system.
I don't see anyone doing this with NetworkManager as a runtime configuration tool for this degree of routing. NM has had historical problems with not scaling well when using many routes, and there is much better software that's designed to do what you want.
Most commercial routers will have BGP functionality, so you could get it "canned". I normally use pfSense or VyOS if I'm going for a "software router" as they both virtualize well. VyOS even maintains LXD images, so I typically use that. You can also use BGP on most Linux distributions by hand with the openbgbpd or quagga packages.
Many SDN solutions use BGP to provide redundancy and network balancing rather than systems like MLAG, as many MLAG implementations on ethernet switches and routers have historically been either too vendor specific or do not operate as expected especially when using non-matching hardware. Rather than worry about control drivers for every switch out there, SDN often is geared towards operating above layer 2 for these multi-node redundancy solutions even within an internal network.
Solution 2:
You can now add a connectivity check to NM, which will automatically increase the interface metric should a host be uncontactable.
See the connectivity section of NetworkManager.conf. Digi also have a good article on the subject.