HAProxy timeouts after 1-2 minutes of load
I am switching our web load balance from Nginx to HAProxy. Currently our setup is working fine in nginx but we want to be more redundant with service checking. Our backend is Golang app.
To keep things simple, we are using just one HAProxy server with a fairly simple config but it seems that after about 1-2 minutes of load the clients are reporting timeouts posting to the app. We have a fairly low timeout (100ms) but it is all local traffic and the app responds typically within 2-3ms.
However, we are doing about 2k posts per second across our platform so I have made some small linux tweaks to get the TCP connections down. Maybe I am missing something here.
Here is a simple flow of these requests. All of these live within our datacenter.
requester -> local nginx server (router) -> haproxy -> app servers
/etc/sysctl.conf
# Decrease TIME_WAIT seconds
net.ipv4.tcp_fin_timeout = 30
# Recycle and Reuse TIME_WAIT sockets faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
Here is our haproxy config. Nothing crazy here.
global
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
defaults
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend http-in
bind *:80
default_backend data_api
backend data_api
option httpchk GET /status/version
server gopp1 10.10.85.3:8000 check
server gopp2 10.10.85.4:8000 check
server gopp3 10.10.85.5:8000 check
stats enable
stats hide-version
stats scope .
stats uri /admin?stats
stats realm Haproxy\ Statistics
stats auth admin:secret
stats admin if TRUE
What is you maxcon
value in HAProxy? Also, check your maxsock
in the HAProxy Stats interface (typically value of ulimit -n
) to make sure you are not running out of file descriptors.
We've loadtested to 400,000 RPM (note: not using SSL Offload). Here is our haproxy config, nothing special performance related in frontends/backends.
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 25000
user haproxy
group haproxy
daemon
spread-checks 4
tune.maxrewrite 1024
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 25000
Here is what we use for kernel parameters.
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range = 1025 65534
net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3
net.core.somaxconn = 10000
fs.file-max = 65536
fs.nr_open = 65536
Also set /etc/security/limits.conf to allow 65536 file descriptors.
haproxy soft nofile 63536
haproxy hard nofile 63536
root soft nofile 63536
root hard nofile 63536