(A Windows guy asks) Measuring Disk Latency on Linux: Do I bother?
On Windows, whenever I want to validate / confirm that there might be IO-related issues on a volume that a database or other low-latency app lives on, I check disk latency.
If I see the Windows Average Disk sec / Transfer counter > 18-20ms consistently, then my canary in a coal mine just died and I need to investigate further. Drop-dead simple.
I'm looking at Linux now, and don't see a similar latency-based metric. The quick research that I've done indicates that I might not even WANT to...I see lots of references to I/O Wait being the way most people track this.
Is there a ballpark rule of thumb that you use in regards to this? For example is ANY i/o wait I see bad for a database's volume? Is there a simple iostat command that gives me a better look at overall disk health than just eyeballing TOP?
Thanks much!
Personally I use the command iostat -xk 10
and look at the await
column.
- -x Display extended statistics.
- -k Display statistics in kilobytes per second. Or use m for megabytes/s.
- 10 display interval in seconds
This is a virtually identical metric to the windows Average Disk sec / Transfer and is listed in ms instead of seconds. So similar rules of thumb could be applied, though this will depend on all sorts of things. I typically find that users start grumbling at 15ms and 20ms is very bad.
Press ctrl+c to quit, or specify the number of iterations to view with the count parameter. Note that the first iteration result is skewed heavily due to the small time sample used in the first iteration.
From the man iostat
page
await The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
Edit:
await
is the main metric I use to watch a disk under production loads to see if its throughput and iops is able to keep up with demand.
The %iowait stat is more about the balance between cpu and disk usage. %iostat will remain lower than expected if both cpu and disk activity are high. On the other side, starting at fairly low disk usage levels, %iostat can be relatively high if the cpu is idle. This being said await needs to be taken with a grain of salt as well. If there is a lot of sequential read/write happening it will skew the figure to a lower value, and your 18~20ms rule of thumb will not be useful under these conditions because most chunks being written will be the sequential data and will be serviced by the disk very quickly, while the other random io will be waiting, due to the Native-Command-Queuing (NCQ) system built in to the disk to optimise throughput by letting the disk choose the sequence that requests are serviced.