Redundant OpenVPN connections with advanced Linux routing over an unreliable network

Use the bonding infrastructure at the 'home' and 'vpn1' side, and specifcally with the mode=3 setting which broadcasts traffic on all interfaces which belong to a bond.

For more information on how to configure bonding, see the excellent manual at http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=blob;f=Documentation/networking/bonding.txt;h=5dc638791d975116bf1a1e590fdfc44a6ae5c33c;hb=HEAD


I used the answer provided by @user48116 and it works like a charm. The setup is actually quite easy!

NOTE: I implemented this with two connections to just one single server, as this already solved the problem for me. If you want to try a setup with two servers, the easiest way is probably to use port forwarding to forward the UDP port from the second server to the first, and use the same recipe as described here. I have not tested this myself though.

First, make sure you have a 2.6 kernel with bonding support (default in all modern distributions) and you have ifenslave installed.

Next, put this into your /etc/rc.local or any other place you prefer, but make sure it's run before openvpn is started (because it will try to bind to bond0):

Client:

modprobe bonding mode=broadcast
ifconfig bond0 10.10.0.2 netmask 255.255.255.0 up

You could add some routing if needed here, make sure you do all the proper routing from the other side too though.

route add -net 10.7.0.0/24 gw 10.10.0.1

Server:

modprobe bonding mode=broadcast
ifconfig bond0 10.10.0.1 netmask 255.255.255.0 up

Create a /etc/openvpn/tap-up.sh script (and don't forget to mark it executable with chmod a+x tap-up.sh):

#!/bin/sh
# called as: cmd tap_dev tap_mtu link_mtu ifconfig_local_ip ifconfig_netmask [ init | restart ]
ifenslave bond0 "$1"

Next, add a bridge0a.conf and bridge0b.conf to /etc/openvpn/ together with a shared key. The files are the same for a and b, except for a different port (for example, use 3002 for b). Replace 11.22.33.44 by your server's public IP.

Client:

remote 11.22.33.44
dev tap
port 3001
rport 3001
secret bridge.key
comp-lzo
verb 4
nobind
persist-tun
persist-key
script-security 2
up /etc/openvpn/tap-up.sh

Server:

local 11.22.33.44
dev tap
port 3001
lport 3001
secret bridge.key
comp-lzo
verb 4
script-security 2
up /etc/openvpn/tap-up.sh

Don't forget to edit /etc/defaults/openvpn to make sure your new VPN configurations are started. Reboot you machines, or load rc.local and restart openvpn manually.

Now you're ready to test your setup:

# ping 10.10.0.1
PING 10.10.0.1 (10.10.0.1) 56(84) bytes of data.
64 bytes from 10.10.0.1: icmp_req=1 ttl=64 time=50.4 ms
64 bytes from 10.10.0.1: icmp_req=1 ttl=64 time=51.1 ms (DUP!)
64 bytes from 10.10.0.1: icmp_req=1 ttl=64 time=51.1 ms (DUP!)
64 bytes from 10.10.0.1: icmp_req=1 ttl=64 time=51.1 ms (DUP!)
64 bytes from 10.10.0.1: icmp_req=2 ttl=64 time=52.0 ms
64 bytes from 10.10.0.1: icmp_req=2 ttl=64 time=52.2 ms (DUP!)
64 bytes from 10.10.0.1: icmp_req=2 ttl=64 time=53.0 ms (DUP!)
64 bytes from 10.10.0.1: icmp_req=2 ttl=64 time=53.1 ms (DUP!)
--- 10.10.0.1 ping statistics ---
2 packets transmitted, 2 received, +6 duplicates, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 50.428/51.786/53.160/0.955 ms

If everything goes well and the line is good, you will see four replies for every ICMP package: your packages are duplicated on the local side, and the replies to these two packages are duplicated again on the remote side. This will not be an issue for TCP connections, because TCP will simply ignore all duplicates.

This is an issue for UDP packets, as it's up to the software to handle duplicates. For example, a DNS query will yield four replies instead of the expected two (and use four times the normal bandwidth for the response instead of two times):

# tcpdump -i bond0 -n port 53
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
13:30:39.870740 IP 10.10.0.2.59330 > 10.7.0.1.53: 59577+ A? serverfault.com. (33)
13:30:40.174281 IP 10.7.0.1.53 > 10.10.0.2.59330: 59577 1/0/0 A 64.34.119.12 (49)
13:30:40.174471 IP 10.7.0.1.53 > 10.10.0.2.59330: 59577 1/0/0 A 64.34.119.12 (49)
13:30:40.186664 IP 10.7.0.1.53 > 10.10.0.2.59330: 59577 1/0/0 A 64.34.119.12 (49)
13:30:40.187030 IP 10.7.0.1.53 > 10.10.0.2.59330: 59577 1/0/0 A 64.34.119.12 (49)

Good luck!