Load balancing network traffic using iptables
I am trying to load balance traffic from internal LAN on a linux router having two gateways. Initially I went for the iproute implementation which didnt balance the load as expected, reason being that routes are cached.
Now I am using iptables to mark every new connection using CONNMARK and then adding rules to route these marked connections over different gateways.
Eth0 - LAN, Eth1 - ISP1, Eth2 - ISP2
Following is the script I am using,
#!/bin/bash
echo 1 >| /proc/sys/net/ipv4/ip_forward
echo 0 >| /proc/sys/net/ipv4/conf/all/rp_filter
# flush all iptables entries
iptables -t filter -F
iptables -t filter -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -t filter -P INPUT ACCEPT
iptables -t filter -P OUTPUT ACCEPT
iptables -t filter -P FORWARD ACCEPT
# initialise chains that will do the work and log the packets
iptables -t mangle -N CONNMARK1
iptables -t mangle -A CONNMARK1 -j MARK --set-mark 1
iptables -t mangle -A CONNMARK1 -j CONNMARK --save-mark
iptables -t mangle -A CONNMARK1 -j LOG --log-prefix 'iptables-mark1: ' --log-level info
iptables -t mangle -N CONNMARK2
iptables -t mangle -A CONNMARK2 -j MARK --set-mark 2
iptables -t mangle -A CONNMARK2 -j CONNMARK --save-mark
iptables -t mangle -A CONNMARK2 -j LOG --log-prefix 'iptables-mark2: ' --log-level info
iptables -t mangle -N RESTOREMARK
iptables -t mangle -A RESTOREMARK -j CONNMARK --restore-mark
iptables -t mangle -A RESTOREMARK -j LOG --log-prefix 'restore-mark: ' --log-level info
iptables -t nat -N SNAT1
iptables -t nat -A SNAT1 -j LOG --log-prefix 'snat-to-192.168.254.74: ' --log-level info
iptables -t nat -A SNAT1 -j SNAT --to-source 192.168.254.74
iptables -t nat -N SNAT2
iptables -t nat -A SNAT2 -j LOG --log-prefix 'snat-to-192.168.253.132: ' --log-level info
iptables -t nat -A SNAT2 -j SNAT --to-source 192.168.253.132
# restore the fwmark on packets that belong to an existing connection
iptables -t mangle -A PREROUTING -i eth0 \
-m state --state ESTABLISHED,RELATED -j RESTOREMARK
# if the mark is zero it means the packet does not belong to an existing connection
iptables -t mangle -A PREROUTING -m state --state NEW \
-m statistic --mode nth --every 2 --packet 0 -j CONNMARK1
iptables -t mangle -A PREROUTING -m state --state NEW \
-m statistic --mode nth --every 2 --packet 1 -j CONNMARK2
iptables -t nat -A POSTROUTING -o eth1 -j SNAT1
iptables -t nat -A POSTROUTING -o eth2 -j SNAT2
if ! cat /etc/iproute2/rt_tables | grep -q '^51'
then
echo '51 rt_link1' >> /etc/iproute2/rt_tables
fi
if ! cat /etc/iproute2/rt_tables | grep -q '^52'
then
echo '52 rt_link2' >> /etc/iproute2/rt_tables
fi
ip route flush table rt_link1 2>/dev/null
ip route add 192.168.254.0/24 dev eth1 src 192.168.254.74 table rt_link1
ip route add default via 192.168.254.5 table rt_link1
ip route flush table rt_link2 2>/dev/null
ip route add 192.168.253.0/24 dev eth2 src 192.168.253.132 table rt_link2
ip route add default via 192.168.253.5 table rt_link2
ip rule del from all fwmark 0x1 lookup rt_link1 2>/dev/null
ip rule del from all fwmark 0x2 lookup rt_link2 2>/dev/null
ip rule del from all fwmark 0x2 2>/dev/null
ip rule del from all fwmark 0x1 2>/dev/null
ip rule add fwmark 1 table rt_link1
ip rule add fwmark 2 table rt_link2
ip route flush cache
Using this connections do get routed over both the routes. However, some of them get dropped ie.connections do not get through . In some cases an established connection gets disrupted midway.
Am I missing something ?
Solution 1:
Here's another approach. Instead of marking connections based upon packet count and hoping they don't get reinitialized, duplicated, or otherwise altered, just divide the packets up by source or destination IP. For any sufficiently large set of connections, you should have about a 50-50 spread.
I'm posting the following as a drop-in replacement, but you can probably do away with the CONNMARK logic altogether with a bit more tinkering.
iptables -t mangle -A PREROUTING -m state --state NEW \
-d 0.0.0.0/0.0.0.1 -j CONNMARK1
iptables -t mangle -A PREROUTING -m state --state NEW \
-d 0.0.0.1/0.0.0.1 -j CONNMARK2
You could also change the destination to source if there's more variance in source IPs, or even combine them into a bracket. (odd/odd or even/even are CONNMARK1, odd/even or even/odd are CONNMARK2).