HAProxy graceful reload with zero packet loss

I'm running an HAProxy load balancing server to balance load to multiple Apache servers. I need to reload HAProxy at any given time in order to change the load balancing algorithm.

This all works fine, except for the fact that I have to reload the server without losing a single packet (at the moment a reload is giving me 99.76% success on average, with 1000 requests per second for 5 seconds). I have done many hours of research about this, and have found the following command for "gracefully reloading" the HAProxy server:

haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

However, this has little or no effect versus the plain old service haproxy reload, it's still dropping 0.24% on average.

Is there any way of reloading the HAProxy config file without a single dropped packet from any user?


Solution 1:

According to https://github.com/aws/opsworks-cookbooks/pull/40 and consequently http://www.mail-archive.com/[email protected]/msg06885.html you can:

iptables -I INPUT -p tcp --dport $PORT --syn -j DROP
sleep 1
service haproxy restart
iptables -D INPUT -p tcp --dport $PORT --syn -j DROP

This has the effect of dropping the SYN before a restart, so that clients will resend this SYN until it reaches the new process.

Solution 2:

Yelp shared a more sophisticated approach based on meticulous testing. The blog article is a deep dive, and well worth the time investment to fully appreciate it.

True Zero Downtime HAProxy Reloads

tl;dr use Linux tc (traffic control) and iptables to temporarily queue SYN packets while HAProxy is reloading and has two pids attached to the same port (SO_REUSEPORT).

I'm not comfortable re-publishing the entire article on ServerFault; nevertheless, here are a few excerpts to pique your interest:

By delaying SYN packets coming into our HAProxy load balancers that run on each machine, we are able to minimally impact traffic during HAProxy reloads, which allows us to add, remove, and change service backends within our SOA without fear of significantly impacting user traffic.

# plug_manipulation.sh
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer
service haproxy reload
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite

# setup_iptables.sh
iptables -t mangle -I OUTPUT -p tcp -s 169.254.255.254 --syn -j MARK --set-mark 1

# setup_qdisc.sh
## Set up the queuing discipline
tc qdisc add dev lo root handle 1: prio bands 4
tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000
tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000
tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000

## Create a plug qdisc with 1 meg of buffer
nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576
## Release the plug
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite

## Set up the filter, any packet marked with “1” will be
## directed to the plug
tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4

Gist: https://gist.github.com/jolynch/97e3505a1e92e35de2c0

Cheers to Yelp for sharing such amazing insights.