HAProxy timeouts after 1-2 minutes of load

I am switching our web load balance from Nginx to HAProxy. Currently our setup is working fine in nginx but we want to be more redundant with service checking. Our backend is Golang app.

To keep things simple, we are using just one HAProxy server with a fairly simple config but it seems that after about 1-2 minutes of load the clients are reporting timeouts posting to the app. We have a fairly low timeout (100ms) but it is all local traffic and the app responds typically within 2-3ms.

However, we are doing about 2k posts per second across our platform so I have made some small linux tweaks to get the TCP connections down. Maybe I am missing something here.

Here is a simple flow of these requests. All of these live within our datacenter.

requester -> local nginx server (router) -> haproxy -> app servers

/etc/sysctl.conf
# Decrease TIME_WAIT seconds
net.ipv4.tcp_fin_timeout = 30

# Recycle and Reuse TIME_WAIT sockets faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

Here is our haproxy config. Nothing crazy here.

global
  ca-base  /etc/ssl/certs
  crt-base  /etc/ssl/private
  log  127.0.0.1   local0
  log  127.0.0.1   local1 notice
  ssl-default-bind-ciphers  ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
  ssl-default-bind-options  no-sslv3
  stats  socket /run/haproxy/admin.sock mode 660 level admin
  stats  timeout 30s

defaults
  errorfile  400 /etc/haproxy/errors/400.http
  errorfile  403 /etc/haproxy/errors/403.http
  errorfile  408 /etc/haproxy/errors/408.http
  errorfile  500 /etc/haproxy/errors/500.http
  errorfile  502 /etc/haproxy/errors/502.http
  errorfile  503 /etc/haproxy/errors/503.http
  errorfile  504 /etc/haproxy/errors/504.http
  mode  http
  option  httplog
  option  dontlognull
  timeout  connect 5000
  timeout  client 50000
  timeout  server  50000

frontend http-in
  bind *:80
  default_backend data_api

backend data_api
  option httpchk GET /status/version
  server gopp1 10.10.85.3:8000 check
  server gopp2 10.10.85.4:8000 check
  server gopp3 10.10.85.5:8000 check
  stats enable
  stats hide-version
  stats scope .
  stats uri /admin?stats
  stats realm   Haproxy\ Statistics
  stats auth    admin:secret
  stats admin   if TRUE

What is you maxcon value in HAProxy? Also, check your maxsock in the HAProxy Stats interface (typically value of ulimit -n) to make sure you are not running out of file descriptors.

We've loadtested to 400,000 RPM (note: not using SSL Offload). Here is our haproxy config, nothing special performance related in frontends/backends.

 global
    log       127.0.0.1 local2
    chroot    /var/lib/haproxy
    pidfile   /var/run/haproxy.pid
    maxconn   25000
    user      haproxy
    group     haproxy
    daemon

    spread-checks 4
    tune.maxrewrite 1024

defaults
  mode                    http
  log                     global
  option                  httplog
  option                  dontlognull
  option http-server-close
  option                  redispatch
  retries                 3
  timeout http-request    10s
  timeout queue           1m
  timeout connect         10s
  timeout client          1m
  timeout server          1m
  timeout http-keep-alive 10s
  timeout check           10s
  maxconn                 25000

Here is what we use for kernel parameters.

net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range = 1025 65534
net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3
net.core.somaxconn = 10000
fs.file-max = 65536
fs.nr_open = 65536

Also set /etc/security/limits.conf to allow 65536 file descriptors.

haproxy           soft    nofile          63536
haproxy           hard    nofile          63536
root              soft    nofile          63536
root              hard    nofile          63536