Exhausting Linux machine TCP socket limit (~70k)?

I am the founder of torservers.net, a non-profit that runs Tor exit nodes. We have a number of machines on Gbit connectivity and multiple IPs, and we seem to be hitting a limit of open TCP sockets across all those machines. We're hovering around ~70k of total TCP connections in total (~10-15k per IP), and Tor is logging "Error binding network socket: Address already in use" like crazy. Is there any solution for this? Does BSD suffer from the same problem?

We run for Tor processes, each of them listening to a different IP. Example:

# NETSTAT=`netstat -nta`
# echo "$NETSTAT" | wc -l
67741
# echo "$NETSTAT" | grep ip1 | wc -l
19886
# echo "$NETSTAT" | grep ip2 | wc -l
15014
# echo "$NETSTAT" | grep ip3 | wc -l
18686
# echo "$NETSTAT" | grep ip4 | wc -l
14109

I have applied the tweaks I could find on the internet:

# cat /etc/sysctl.conf
net.ipv4.ip_forward = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.send_redirects = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.all.send_redirects = 0
kernel.sysrq = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.core.netdev_max_backlog = 262144
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_orphan_retries = 2
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_fin_timeout = 4
vm.min_free_kbytes = 65536
net.ipv4.netfilter.ip_conntrack_max = 196608
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
net.netfilter.nf_conntrack_checksum = 0
net.netfilter.nf_conntrack_max = 196608
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 15
net.nf_conntrack_max = 196608
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.ip_local_port_range = 1025 65535
net.core.somaxconn = 262144
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_timestamps = 0

# sysctl fs.file-max
fs.file-max = 806854

# ulimit -n
500000

# cat /etc/security/limits.conf
*       soft    nofile 500000
*       hard    nofile 500000

Solution 1:

If there are processes binding to INADDR_ANY, then some systems will try to pick ports only from the range of 49152 to 65535. That could account for your ~15k limit as the range is exactly 16384 ports.

Wikipedia: Ephemeral port

You may be able to expand that range by finding the instructions for your OS here:

The Ephemeral Port Range

Solution 2:

This is a limitation of TCP protocol. The port is an unsigned short int (0-65535). The solution is to use different IP addresses.

If the software can not be changed you can use virtualization. Create VMs that are bridged (not NATed) and that use public IPs so they will not be NATed later.

Check with netstat that the listeners use the IP on a interface and not all addresses (0.0.0.0):

sudo netstat -tulnp|grep '0\.0\.0\.0'