Linux server not allowing more than 2048 concurrent connections
I tried to do load test on MQTT from my MACOS, and was able to successfully achieve more than 12k connections till my bandwidth got exhausted.
I tried to do the same test on GCP machine and it gave me connection timed out exception once the opened ports reached 2048, and 2048 connections with MQTT broker.
When connecting, my ConnectionTimeout is 100s (waiting for conack) and KeepAlive = 300s (once connection is established)
This issue is happening regardless of load test software i.e. mzbench, jmeter and emqtt-bench. So, I think this issue is related to the linux server.
I am not looking to achieve 1 Million open connections, but looking for atleast 30K open connections.
Have already tried to change ulimit and these are my ulimit configurations
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63887
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 102400
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 200000
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
cat on proc also gives max file open as 102400
also these are values set in my sysctl
fs.file-max = 200000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_mem = 50576 64768 98152
net.core.netdev_max_backlog = 2500
Edit: Added machine details and test pattern
Machine type: n2-highcpu-16 (16 vCPUs, 16 GB memory)
Result of lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU
Stepping: 7
CPU MHz: 2800.200
BogoMIPS: 5600.40
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Test pattern: Opening 200 connections per second and waiting for conack at a constant rate.
I was able to solve this problem, thanks to the comments above.
I was hitting public LoadBalancer IP from my test VM. However, GCP has limit of max 2048 connections from a VM to public IP. Once I changed this to private IP, I was able to achieve close to 65k connections.