5.5GB written daily to 1.2GB root volume - 4 times previous levels

Solution 1:

Since the leading cause seemed to be journaling, that would have been my next step. In order to remove journaling, however, I would need to attach the EBS volume to another instance. I decided to test the procedure out using a (day old) snapshot, however, before removing journaling, I re-ran the 10 minute iotop test (on the test instance). To my surprise, I saw normal (i.e. non-elevated) values, and this was the first time that flush-202 didn't even show up on the list. This was a fully functional instance (I restored snapshots of my data as well) - there had been no changes to the root volume in the 12 hours or so since it was taken. All tests showed that the same processes were running on both servers. This led me to believe that the cause must come down to some requests that the 'live' server is processing.

Looking at the differences between the iotop outputs of the server displaying the problem and the seemingly identical server that had no problem, the only differences were flush-202 and php-fpm. This got me thinking that, while a long shot, perhaps it was a problem related to the PHP configuration.

Now, this part wasn't ideal - but since none of the services running on the live server would suffer from a few minutes of downtime it didn't really matter. To narrow down the problem, all the major services (postfix, dovecot, imapproxy, nginx, php-fpm, varnish, mysqld, varnishncsa) on the live server were stopped, and the iotop test rerun - there was no elevated disk i/o. The services were restarted in 3 batches, leaving php-fpm until the end. After each batch of restarts, the iotop test confirmed that there was no issue. Once php-fpm was started the issue returned. (It would have been easy enough to simulate a few PHP requests on the test server, but at this point, I wasn't sure it was actually PHP).

Unfortunately, the server would be rather pointless without PHP, so this wasn't an ideal conclusion. However, since flush-202 seemed to suggest something relating to memory (despite having ample free memory), I decided to disable APC. Rerunning the iotop test showed that disk i/o levels were normal. A closer look into the matter showed that mmap was enabled, and that apc.mmap_file_mask was set to /tmp/apc.XXXXXX (the default for this install). That path sets APC to use file-backed mmap. Simply commenting this line out (therefore using the default - anonymous memory backed) and rerunning the iotop test showed the problem was resolved.

I still do not know why none of the diagnostics run did not identify the writes as coming from php and going to the apc files in the /tmp directory. The only test that even mentioned the /tmp directory was lsof, however, the files it listed were non-existent.