iostat reports significantly different '%util' and 'await' for two identical disks in mdadm RAID1

I have a server running CentOS 6 with two Crucial M500 SSDs configured in mdadm RAID1. This server is also virtualized with Xen.

Recently, I started seeing iowait percentages creep up in the top -c stats of our production VM. I decided to investigate and ran iostat on the dom0 so I could inspect activity on the physical disks (e.g., /dev/sda and /dev/sdb). This is the command I used: iostat -d -x 3 3

Here's an example of the output I received (scroll to the right for %util numbers):

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.33    0.00   38.67     0.00   337.33     8.72     0.09    2.22    0.00    2.22   1.90   7.33
sdb               0.00     0.33    0.00   38.67     0.00   338.00     8.74     1.08   27.27    0.00   27.27  23.96  92.63
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md127             0.00     0.00    0.00   29.33     0.00   312.00    10.64     0.00    0.00    0.00    0.00   0.00   0.00
drbd5             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
drbd3             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
drbd4             0.00     0.00    0.00    8.67     0.00    77.33     8.92     2.03  230.96    0.00  230.96  26.12  22.63
dm-0              0.00     0.00    0.00   29.67     0.00   317.33    10.70     5.11  171.56    0.00  171.56  23.91  70.93
dm-1              0.00     0.00    0.00    8.67     0.00    77.33     8.92     2.03  230.96    0.00  230.96  26.12  22.63
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00   20.00     0.00   240.00    12.00     3.03  151.55    0.00  151.55  31.33  62.67
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

To my alarm, I noticed that there was a significant difference between /dev/sda and /dev/sdb in await (2ms vs 27ms) and %util (7% vs 92%). These drives are mirrors of one another and are the same Crucial M500 SSD so I don't understand how this could be. There is no activity on /dev/sda that should not also occur on /dev/sdb.

I've been regularly checked the SMART values for both of these disks and I've noticed that the Percent_Lifetime_Used for /dev/sda indicates 66% used while /dev/sdb reports a non-sensical value (454% used). I hadn't been too concerned up until this point because the Reallocated_Event_Count has remained relatively low for both drives and hasn't changed quickly.

SMART values for /dev/sda

SMART values for /dev/sdb

Could there be a hardware issue with our /dev/sdb disk? Any other possible explanations?

I eventually discovered that this system was not being TRIMed properly and was also partitioned with insufficient overprovisioning (even though the Crucial M500 has 7% level 2 overprovisioning built-in). The combination of the two led to a severe case of write amplification.

Furthermore, this system houses a database with very high write activity leading to a very high number of small random writes. This sort of IO activity has a very poor outcome with write amplification.

I'm still not 100% certain why /dev/sda was performing better than /dev/sdb in iostat -- perhaps it was something akin to the silicon lottery where /dev/sda was marginally superior to /dev/sdb so /dev/sdb bottlenecked first.

For us, the two major takeaways are:

Overprovision your SSDs at 20% (taking into account your SSD may already have 0%, 7% or 28% level 2 overprovisioning).
Run TRIM on a weekly basis.

iostat reports significantly different '%util' and 'await' for two identical disks in mdadm RAID1

Related

Recent Posts