high load average, high wait, dmesg raid error messages (debian nfs server)

Debian 6 on HP proliant (2 CPU) with raid (2*1.5T RAID1 + 2*2T RAID1 joined RAID0 to make 3.5T) running mainly nfs & imapd (plus samba for windows share & local www for previewing web pages); with local ubuntu desktop client mounting $HOME, laptops accessing imap & odd files (e.g. videos) via nfs/smb; boxes connected 100baseT or wifi via home router/switch

uname -a

Linux prole 2.6.32-5-686 #1 SMP Wed Jan 11 12:29:30 UTC 2012 i686 GNU/Linux

Setup has been working for months but prone to intermittently going very slow (user experience on desktop mounting $HOME from server, or laptop playing videos) and now consistently so bad I've had to delve into it to try to find what's wrong(!)

Server seems OK at low load e.g. (laptop) client (with $HOME on local disk) connecting to server's imapd and nfs mounting RAID to access 1 file: top shows load ~ 0.1 or less, 0 wait

but when (desktop) client mounts $HOME and starts user KDE session (all accessing server) then top shows e.g.

top - 13:41:17 up  3:43,  3 users,  load average: 9.29, 9.55, 8.27
Tasks: 158 total,   1 running, 157 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.4%us,  0.4%sy,  0.0%ni, 49.0%id, 49.7%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:    903856k total,   851784k used,    52072k free,   171152k buffers
Swap:        0k total,        0k used,        0k free,   476896k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                     
 3935 root      20   0  2456 1088  784 R    2  0.1   0:00.02 top                                                                                                         
    1 root      20   0  2028  680  584 S    0  0.1   0:01.14 init                                                                                                        
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd                                                                                                    
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0                                                                                                 
    4 root      20   0     0    0    0 S    0  0.0   0:00.12 ksoftirqd/0                                                                                                 
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0                                                                                                  
    6 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/1                                                                                                 
    7 root      20   0     0    0    0 S    0  0.0   0:00.16 ksoftirqd/1                                                                                                 
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/1                                                                                                  
    9 root      20   0     0    0    0 S    0  0.0   0:00.42 events/0                                                                                                    
   10 root      20   0     0    0    0 S    0  0.0   0:02.26 events/1                                                                                                    
   11 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset                                                                                                      
   12 root      20   0     0    0    0 S    0  0.0   0:00.00 khelper                                                                                                     
   13 root      20   0     0    0    0 S    0  0.0   0:00.00 netns                                                                                                       
   14 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr                                                                                                   
   15 root      20   0     0    0    0 S    0  0.0   0:00.00 pm                                                                                                          
   16 root      20   0     0    0    0 S    0  0.0   0:00.02 sync_supers                                                                                                 
   17 root      20   0     0    0    0 S    0  0.0   0:00.02 bdi-default                                                                                                 
   18 root      20   0     0    0    0 S    0  0.0   0:00.00 kintegrityd/0                                                                                               
   19 root      20   0     0    0    0 S    0  0.0   0:00.00 kintegrityd/1                                                                                               
   20 root      20   0     0    0    0 S    0  0.0   0:00.02 kblockd/0                                                                                                   
   21 root      20   0     0    0    0 S    0  0.0   0:00.08 kblockd/1                                                                                                   
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 kacpid                                                                                                      
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 kacpi_notify                                                                                                
   24 root      20   0     0    0    0 S    0  0.0   0:00.00 kacpi_hotplug                                                                                               
   25 root      20   0     0    0    0 S    0  0.0   0:00.00 kseriod                                                                                                     
   28 root      20   0     0    0    0 S    0  0.0   0:04.19 kondemand/0                                                                                                 
   29 root      20   0     0    0    0 S    0  0.0   0:02.93 kondemand/1                                                                                                 
   30 root      20   0     0    0    0 S    0  0.0   0:00.00 khungtaskd                                                                                                  
   31 root      20   0     0    0    0 S    0  0.0   0:00.18 kswapd0                                                                                                     
   32 root      25   5     0    0    0 S    0  0.0   0:00.00 ksmd                                                                                                        
   33 root      20   0     0    0    0 S    0  0.0   0:00.00 aio/0                                                                                                       
   34 root      20   0     0    0    0 S    0  0.0   0:00.00 aio/1                                                                                                       
   35 root      20   0     0    0    0 S    0  0.0   0:00.00 crypto/0                                                                                                    
   36 root      20   0     0    0    0 S    0  0.0   0:00.00 crypto/1                                                                                                    
  203 root      20   0     0    0    0 S    0  0.0   0:00.00 ksuspend_usbd                                                                                               
  204 root      20   0     0    0    0 S    0  0.0   0:00.00 khubd                                                                                                       
  205 root      20   0     0    0    0 S    0  0.0   0:00.00 ata/0                                                                                                       
  206 root      20   0     0    0    0 S    0  0.0   0:00.00 ata/1                                                                                                       
  207 root      20   0     0    0    0 S    0  0.0   0:00.14 ata_aux                                                                                                     
  208 root      20   0     0    0    0 S    0  0.0   0:00.01 scsi_eh_0  

dmesg suggests there's a disk problem:

.............. (previous episode)
[13276.966004] raid1:md0: read error corrected (8 sectors at 489900360 on sdc7)
[13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror

[13279.569186] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[13279.569211] ata4.00: irq_stat 0x40000008
[13279.569230] ata4.00: failed command: READ FPDMA QUEUED
[13279.569257] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13279.569262]          res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13279.569306] ata4.00: status: { DRDY ERR }
[13279.569321] ata4.00: error: { UNC }
[13279.575362] ata4.00: configured for UDMA/133
[13279.575388] ata4: EH complete
[13283.169224] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[13283.169246] ata4.00: irq_stat 0x40000008
[13283.169263] ata4.00: failed command: READ FPDMA QUEUED
[13283.169289] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13283.169294]          res 41/40:00:07:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13283.169331] ata4.00: status: { DRDY ERR }
[13283.169345] ata4.00: error: { UNC }
[13283.176071] ata4.00: configured for UDMA/133
[13283.176104] ata4: EH complete
[13286.224814] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[13286.224837] ata4.00: irq_stat 0x40000008
[13286.224853] ata4.00: failed command: READ FPDMA QUEUED
[13286.224879] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13286.224884]          res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13286.224922] ata4.00: status: { DRDY ERR }
[13286.224935] ata4.00: error: { UNC }
[13286.231277] ata4.00: configured for UDMA/133
[13286.231303] ata4: EH complete
[13288.802623] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[13288.802646] ata4.00: irq_stat 0x40000008
[13288.802662] ata4.00: failed command: READ FPDMA QUEUED
[13288.802688] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13288.802693]          res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13288.802731] ata4.00: status: { DRDY ERR }
[13288.802745] ata4.00: error: { UNC }
[13288.808901] ata4.00: configured for UDMA/133
[13288.808927] ata4: EH complete
[13291.380430] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[13291.380453] ata4.00: irq_stat 0x40000008
[13291.380470] ata4.00: failed command: READ FPDMA QUEUED
[13291.380496] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13291.380501]          res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13291.380577] ata4.00: status: { DRDY ERR }
[13291.380594] ata4.00: error: { UNC }
[13291.386517] ata4.00: configured for UDMA/133
[13291.386543] ata4: EH complete
[13294.347147] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0

[13294.347169] ata4.00: irq_stat 0x40000008
[13294.347186] ata4.00: failed command: READ FPDMA QUEUED
[13294.347211] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in
[13294.347217]          res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F>
[13294.347254] ata4.00: status: { DRDY ERR }
[13294.347268] ata4.00: error: { UNC }
[13294.353556] ata4.00: configured for UDMA/133
[13294.353583] sd 3:0:0:0: [sdc] Unhandled sense code
[13294.353590] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[13294.353599] sd 3:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]
[13294.353610] Descriptor sense data with sense descriptors (in hex):
[13294.353616]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[13294.353635]         23 05 6a 06 
[13294.353644] sd 3:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
[13294.353657] sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 23 05 6a 00 00 00 08 00
[13294.353675] end_request: I/O error, dev sdc, sector 587557382
[13294.353726] ata4: EH complete
[13294.366953] raid1:md0: read error corrected (8 sectors at 489900544 on sdc7)
[13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror

and they're happening quite frequently, which I guess is liable to account for the performance problem(?)

# dmesg | grep mirror

[12433.561822] raid1: sdc7: redirecting sector 489900464 to another mirror
[12449.428933] raid1: sdb7: redirecting sector 489900504 to another mirror
[12464.807016] raid1: sdb7: redirecting sector 489900512 to another mirror
[12480.196222] raid1: sdb7: redirecting sector 489900520 to another mirror
[12495.585413] raid1: sdb7: redirecting sector 489900528 to another mirror
[12510.974424] raid1: sdb7: redirecting sector 489900536 to another mirror
[12526.374933] raid1: sdb7: redirecting sector 489900544 to another mirror
[12542.619938] raid1: sdc7: redirecting sector 489900608 to another mirror
[12559.431328] raid1: sdc7: redirecting sector 489900616 to another mirror
[12576.553866] raid1: sdc7: redirecting sector 489900624 to another mirror
[12592.065265] raid1: sdc7: redirecting sector 489900632 to another mirror
[12607.621121] raid1: sdc7: redirecting sector 489900640 to another mirror
[12623.165856] raid1: sdc7: redirecting sector 489900648 to another mirror
[12638.699474] raid1: sdc7: redirecting sector 489900656 to another mirror
[12655.610881] raid1: sdc7: redirecting sector 489900664 to another mirror
[12672.255617] raid1: sdc7: redirecting sector 489900672 to another mirror
[12672.288746] raid1: sdc7: redirecting sector 489900680 to another mirror
[12672.332376] raid1: sdc7: redirecting sector 489900688 to another mirror
[12672.362935] raid1: sdc7: redirecting sector 489900696 to another mirror
[12674.201177] raid1: sdc7: redirecting sector 489900704 to another mirror
[12698.045050] raid1: sdc7: redirecting sector 489900712 to another mirror
[12698.089309] raid1: sdc7: redirecting sector 489900720 to another mirror
[12698.111999] raid1: sdc7: redirecting sector 489900728 to another mirror
[12698.134006] raid1: sdc7: redirecting sector 489900736 to another mirror
[12719.034376] raid1: sdc7: redirecting sector 489900744 to another mirror
[12734.545775] raid1: sdc7: redirecting sector 489900752 to another mirror
[12734.590014] raid1: sdc7: redirecting sector 489900760 to another mirror
[12734.624050] raid1: sdc7: redirecting sector 489900768 to another mirror
[12734.647308] raid1: sdc7: redirecting sector 489900776 to another mirror
[12734.664657] raid1: sdc7: redirecting sector 489900784 to another mirror
[12734.710642] raid1: sdc7: redirecting sector 489900792 to another mirror
[12734.721919] raid1: sdc7: redirecting sector 489900800 to another mirror
[12734.744732] raid1: sdc7: redirecting sector 489900808 to another mirror
[12734.779330] raid1: sdc7: redirecting sector 489900816 to another mirror
[12782.604564] raid1: sdb7: redirecting sector 1242934216 to another mirror
[12798.264153] raid1: sdc7: redirecting sector 1242935080 to another mirror
[13245.832193] raid1: sdb7: redirecting sector 489898296 to another mirror
[13261.376929] raid1: sdb7: redirecting sector 489898304 to another mirror
[13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror
[13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror

although the arrays are still running on all disks - they haven't given up on any yet:

# cat /proc/mdstat

Personalities : [raid1] [raid0] 
md10 : active raid0 md0[0] md1[1]
      3368770048 blocks super 1.2 512k chunks

md1 : active raid1 sde2[2] sdd2[1]
      1464087824 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb7[0] sdc7[2]
      1904684920 blocks super 1.2 [2/2] [UU]

unused devices: <none>

So I think I have some idea what the problem is but I am not a linux sysadmin expert by the remotest stretch of the imagination and would really appreciate some clue checking here with my diagnosis and what do I need to do:

  1. obviously I need to source another drive for sdc. (I'm guessing I could buy a larger drive if the price is right: I'm thinking that one day I'll need to grow the size of the array and that would be one less drive to replace with a larger one)
  2. then use mdadm to fail out the existing sdc, remove it and fit the new drive
  3. fdisk the new drive with the same size partition for the array as the old one had
  4. use mdadm to add the new drive into the array

that sound OK?


Solution 1:

Usually when there are disk error like you are seeing, the disk tends to pause for a moment to try and correct the error itself, and the Linux RAID will tolerate some disk waiting until it marks is as bad. This disk pausing could be what is causing the slow downs (especially at the error rate you are seeing).

Your plan to replace the drive is correct. However, I wouldn't recommend getting a bigger driver with the notion of partitioning part for the replacement RAID and then part for use of something else later on. It would be more wise to just get closer to the original disk (size and speed) to keep the array consistent. That said, you COULD, in theory, go larger and partition the exact replacement size for the array replacement and then another partition for something else (even another member of an array).

A side note that may help with debugging: A tool I like to use as a replacement to top is called atop (http://www.atoptool.nl/) and it may give you more visualization into each disk and processes using disk I/O and where the bottleneck is (you may notice that the wait I/O is for the specific disk having the problem).