Why does heavy I/O slow my server to a crawl?

I've been trying to troubleshoot a problem with I/O on my disks. The setup is as follows:

  • OS: CentOS 5.6
  • Disk layout:
    • Disks (/dev/sda, /dev/sdb)
    • Partition (/dev/sda1, /dev/sdb1)
    • MD Array (RAID-1) (/dev/md0)
    • LVM Stack (/dev/VolGrp00/RootLV)

Initially, I noticed that when performing heavy I/O (i.e. mkfs), the system would slow to a crawl, to the point that I couldn't move the mouse pointer in my X session. I started logging some metrics and saw that the load average would slowly climb, up to above 5.0 on my dual-core server. At the same time, my memory picture goes from nearly 2GB free to about 10MB free and nearly 2GB buffer. Based on that, I suspect that some kind of caching is to blame, but I'm not familiar enough with the nuts and bolts of LVM, MD and the Linux I/O subsystem to know where to start looking.

One oddity I found: it doesn't seem to matter if I strip off the LVM layer and write directly to the array, and even removing the array doesn't help much (though writing directly to the partition seems to cause shorter bursts of delays than writing to the array).

Most of my testing has been done with the command mkfs.ext4 -i 4096 -m 0 <device>, though I have tested this behavior with dd if=/dev/urandom bs=4K of=<device> and received similar results so I'm fairly sure it's not the fault of mkfs. Also, I've tried this on another system (from another hardware vendor, but still CentOS 5.6) and again see similar results.

I'm okay with any solutions that cause my I/O operations taking a little longer to complete, though answers like "use the direct I/O flag" are unacceptable since they cause mkfs to go from 10 minutes to 16 hours (been there, tried that). I'm searching for tuning parameters and am also looking into changing I/O schedulers, but I figured it might be helpful to ask the community for some guidance in the right direction.

EDIT :

As it turns out, the problem is more related to memory pressure and the virtual memory manager causing I/O requests to block. Here's my current understanding of the issue: As mkfs or dd run, they generate more I/O than the disks can keep up with, so the buffer begins to fill up. As the vm.dirty_ratio is reached, I/O requests from all processes begin to block until the cache clears out some space (source). At the same time, low memory conditions trigger the kernel to start swapping processes out of physical memory and onto disk... this generates even more I/O and those I/O requests can block while waiting for the cache to clear.

I have tried tuning vm.dirty_ratio and other related parameters, but they only change when the system starts to slow down (lower ratio = faster lock-up). I've also tried swapping out I/O schedulers and tweaking their parameters to try to clear the cache faster but have had no success. As a last resort, I tried running the mkfs with ionice -c3, but since the disks are mostly idle at the time I'm running the mkfs, the same problem manifests itself. I think the slowdowns could be avoided if there were a way to throttle the I/O request rate of a specific process, but I'm not aware of anything that will do that.

I'm definitely still open to suggestions as far as what to try -- whoever can push me in the right direction gets the green checkmark.

ANOTHER EDIT :

I've stumbled across control groups, but they are unfortunately only available starting in RHEL6. Cgroups could be employed to start the mkfs in a group with throttled block I/O, but since these systems must remain 5.6 for the time being I will have to either keep looking for another solution or deal with the slowness issues until upgrade time.


Solution 1:

From the scanty details you've given us, it sounds to me that you need to tune your I/O scheduler. That can have a significant impact on the lockup effect you're experiencing. I believe CentOS 5.6 uses CFQ. You may get less locking with deadline.