"Too many open files" Apache error even with increased ulimit and sysctl
I'm load testing an Amazon Linux EC2 instance running Apache (event MPM) and PHP-FPM using Locust. When I run my load test with 200 users (~28 requests per second), everything is fine. When I boost the number of users to 300 (~43 requests per second), I start seeing these errors in the Locust logs:
ConnectionError(MaxRetryError("HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x....>: Failed to establish a new connection: [Errno 24] Too many open files'))"))
Researching online, I decided to bump up the available number of open file descriptors to see if I could get around this issue. I edited /etc/security/limits.conf
and set the following values (possibly exaggerated but I'm just trying to see if something sticks):
* soft nofile 65000
* hard nofile 65000
* soft nproc 10240
* hard nproc 10240
Afterwards, I restarted both Apache and PHP-FPM:
sudo service httpd restart
sudo service php-fpm restart
I also looked at the processes to verify the new limits and make sure they were sticking. One of Apache's child processes:
$ cat /proc/22725/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 14745 14745 processes
Max open files 170666 170666 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 14745 14745 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
And one of PHP-FPM's child processes:
$ cat /proc/22963/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 10240 10240 processes
Max open files 10240 10240 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 14745 14745 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I've also upped the max open files at the kernel level in /etc/sysctl.conf
:
fs.file-max = 512000
Then I persisted the values with sysctl -p
. Again, this is probably egregious but I saw the same results with a value of 65000
.
Under load, I'm only seeing ~4,200 open files, which is puzzling given the overall limits I've provided:
$ lsof | wc -l
4178
During all of this, my CPU usage never goes above 20%, and my server still has around 3GB of free memory.
Any ideas?
After sleeping on this issue, I realized the problem might not be on the server side at all, but instead on the client side (i.e., my laptop running Locust). Indeed, checking ulimit -a
here gave these results (running macOS 10.14.6):
➜ ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 1418
-n: file descriptors 256
Bumping the file descriptors up to 2048 (ulimit -n 2048
) and re-running Locust in the same shell made the errors go away.
Sorry for the quick question-and-answer, but I thought I'd keep this up rather than remove the question in case someone else runs into this issue.