Unxeplainably high disk IO caused by nginx worker processes
I have just setup a ubuntu 12.04.2 LTS server that serves a big number of quite large static files. The configuration is the same as on another machine which works great. The other machine uses Ubuntu 11.10 with nginx 1.0.5 . The machine with the problem uses nginx 1.1.19 and it can hardly push around 20MB/s (but is on a 1Gbit dedicated line) with the iotop showing high disk IO by nginx. This is from iotop:
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
4569 be/4 www-data 754.61 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process
4571 be/4 www-data 1257.69 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process
4574 be/4 www-data 2.46 M/s 0.00 B/s 0.00 % 99.99 % nginx: worker process
3951 be/4 www-data 1760.77 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
3950 be/4 www-data 503.08 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
4573 be/4 www-data 2012.31 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process
3952 be/4 www-data 1006.15 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
3954 be/4 www-data 1760.77 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
4572 be/4 www-data 4.05 M/s 0.00 B/s 0.00 % 99.99 % nginx: worker process
3956 be/4 www-data 2.70 M/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
3953 be/4 www-data 251.54 K/s 0.00 B/s 0.00 % 99.99 % nginx: worker process is shutting down
4567 be/4 www-data 2.21 M/s 0.00 B/s 0.00 % 98.30 % nginx: worker process
4570 be/4 www-data 754.61 K/s 0.00 B/s 0.00 % 97.91 % nginx: worker process
3949 be/4 www-data 1006.15 K/s 0.00 B/s 0.00 % 88.21 % nginx: worker process is shutting down
3955 be/4 www-data 1509.23 K/s 0.00 B/s 0.00 % 84.60 % nginx: worker process is shutting down
So for some reason those processes that try to shutdown cause the IO and the server goes in a almost non-responsive state, with the load growing as high as 5-6 (this is a dual core machine). The CPU utilisation meanwhile is aroun 0.5%
After restarting nginx everything is fine for some time and then this happens again.
This is the latest from the error log of nginx:
013/03/18 13:09:28 [alert] 3676#0: open socket #297 left in connection 145
and then this happens:
2013/03/18 13:10:11 [alert] 3749#0: 100 worker_connections are not enough
and this is the nginx.conf:
user www-data;
worker_processes 8;
worker_rlimit_nofile 20480;
pid /var/run/nginx.pid;
events {
worker_connections 100;
# multi_accept on;
}
http {
##
# Basic Settings
##
sendfile off;
output_buffers 1 512k;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 5;
types_hash_max_size 2048;
Any help will be highly appreciated!
EDIT:
Sendfile
on and off makes no difference.
worker_rlimit_nofile == worker_connections
makes no difference.
worker_processes
changes nothing also.
smartctl
shows no problems with the disk, however I tried with the second disk on this machine and still no difference.
Solution 1:
Relatively recent HDDs can do 150MB/s (1.2Gbps) on sequential reads (and writes), but if you have several parallel reads / writes (even if each read itself is still sequential), then the throughput speed will easily drop 10×.
So, 20MB/s (160Mbps) sounds like a limitation of your HDD.
Perhaps the other server has an SSD, or has more memory, and has these files cached, but this one has something wrong on the cache side (probably little memory, but maybe wrongly optimised kernel settings).
In any case, this would likely sound like something that's outside of nginx control.
You might try increasing your nginx in-memory buffers by several folds in an attempt to make the reads a little more sequential, but if you only have one platter-based HDD (e.g. at most 150MB/s on a single sequential read, drop of several folds on multiple reads), and no caching gets utilised due to low memory, then you won't be able to push anywhere close to 1Gbps (128MB/s).
If you really need 1Gbps throughput: if most common files can be cached in memory, get more memory; else, get a fast SSD.