Unxeplainably high disk IO caused by nginx worker processes

I have just setup a ubuntu 12.04.2 LTS server that serves a big number of quite large static files. The configuration is the same as on another machine which works great. The other machine uses Ubuntu 11.10 with nginx 1.0.5 . The machine with the problem uses nginx 1.1.19 and it can hardly push around 20MB/s (but is on a 1Gbit dedicated line) with the iotop showing high disk IO by nginx. This is from iotop:

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 4569 be/4 www-data  754.61 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process
 4571 be/4 www-data 1257.69 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process
 4574 be/4 www-data    2.46 M/s    0.00 B/s  0.00 % 99.99 % nginx: worker process
 3951 be/4 www-data 1760.77 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is     shutting down
 3950 be/4 www-data  503.08 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is shutting down
 4573 be/4 www-data 2012.31 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process
 3952 be/4 www-data 1006.15 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is shutting down
 3954 be/4 www-data 1760.77 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is shutting down
 4572 be/4 www-data    4.05 M/s    0.00 B/s  0.00 % 99.99 % nginx: worker process
 3956 be/4 www-data    2.70 M/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is shutting down
 3953 be/4 www-data  251.54 K/s    0.00 B/s  0.00 % 99.99 % nginx: worker process is shutting down
 4567 be/4 www-data    2.21 M/s    0.00 B/s  0.00 % 98.30 % nginx: worker process
 4570 be/4 www-data  754.61 K/s    0.00 B/s  0.00 % 97.91 % nginx: worker process
 3949 be/4 www-data 1006.15 K/s    0.00 B/s  0.00 % 88.21 % nginx: worker process is shutting down
 3955 be/4 www-data 1509.23 K/s    0.00 B/s  0.00 % 84.60 % nginx: worker process is shutting down

So for some reason those processes that try to shutdown cause the IO and the server goes in a almost non-responsive state, with the load growing as high as 5-6 (this is a dual core machine). The CPU utilisation meanwhile is aroun 0.5%

After restarting nginx everything is fine for some time and then this happens again.

This is the latest from the error log of nginx:

013/03/18 13:09:28 [alert] 3676#0: open socket #297 left in connection 145

and then this happens:

2013/03/18 13:10:11 [alert] 3749#0: 100 worker_connections are not enough

and this is the nginx.conf:

user www-data;
worker_processes 8;
worker_rlimit_nofile 20480;
pid /var/run/nginx.pid;

events {
    worker_connections 100;
    # multi_accept on;
}

http {

    ##
    # Basic Settings
    ##

    sendfile off;
        output_buffers 1 512k;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 5;
    types_hash_max_size 2048;

Any help will be highly appreciated!

EDIT:

Sendfile on and off makes no difference.

worker_rlimit_nofile == worker_connections makes no difference.

worker_processes changes nothing also.

smartctl shows no problems with the disk, however I tried with the second disk on this machine and still no difference.


Solution 1:

Relatively recent HDDs can do 150MB/s (1.2Gbps) on sequential reads (and writes), but if you have several parallel reads / writes (even if each read itself is still sequential), then the throughput speed will easily drop 10×.

So, 20MB/s (160Mbps) sounds like a limitation of your HDD.

Perhaps the other server has an SSD, or has more memory, and has these files cached, but this one has something wrong on the cache side (probably little memory, but maybe wrongly optimised kernel settings).

In any case, this would likely sound like something that's outside of nginx control.

You might try increasing your nginx in-memory buffers by several folds in an attempt to make the reads a little more sequential, but if you only have one platter-based HDD (e.g. at most 150MB/s on a single sequential read, drop of several folds on multiple reads), and no caching gets utilised due to low memory, then you won't be able to push anywhere close to 1Gbps (128MB/s).

If you really need 1Gbps throughput: if most common files can be cached in memory, get more memory; else, get a fast SSD.