Can anyone explain precisely what IOWait is?

Solution 1:

I know it's the time spent by the CPU waiting for a IO operations to complete, but what kind of IO operations precisely? What I am also not sure, is why it so important? Can't the CPU just do something else while the IO operation completes, and then get back to processing data?

Yes, the operating system will schedule other processes to run while one is blocked on IO. However inside that process, unless it's using asynchronous IO, it will not progress until whatever IO operation is complete.

Also what are the right tools to diagnose what process(es) did exactly wait for IO.

Some tools you might find useful

  • iostat, to monitor the service times of your disks
  • iotop (if your kernel supports it), to monitor the breakdown of IO requests per process
  • strace, to look at the actual operations issued by a process

And what are the ways to minimize IO wait time?

  • ensure you have free physical memory so the OS can cache disk blocks in memory
  • keep your filesystem disk usage below 80% to avoid excessive fragmentation
  • tune your filesystem
  • use a battery backed array controller
  • choose good buffer sizes when performing io operations

Solution 2:

Old question, recently bumped, but felt the existing answers were insufficient.

IOWait definition & properties

IOWait (usually labeled %wa in top) is a sub-category of idle (%idle is usually expressed as all idle except defined subcategories), meaning the CPU is not doing anything. Therefore, as long as there is another process that the CPU could be processing, it will do so. Additionally, idle, user, system, iowait, etc are a measurement with respect to the CPU. In other words, you can think of iowait as the idle caused by waiting for io.

Precisely, iowait is time spent receiving and handling hardware interrupts as a percentage of processor ticks. Software interrupts usually are labled separately as %si.

Importance & Potential misconception

IOWait is important because it often is a key metric to know if you're bottlenecked on IO. But absense of iowait does not necessarily mean your application is not bottlenecked on IO. Consider two applications running on a system. If program 1 is heavily io bottlenecked and program 2 is a heavy CPU user, the %user + %system of CPU may still be something like ~100% and correspondingly, iowait would show 0. But that's just because program 2 is intensive and relatively appear to say nothing about program 1 because all this is from the CPU's point of view.

Tools to Detect IOWait

See posts by Dave Cheney and Xerxes

But also a simple top will show in %wa.

Reducing IOWait

Also, as we are now almost entering 2013, in addition to what others said, the option of simply awesome IO storage devices are affordable, namely SSDs. SSDs are awesome!!!