Cannot recover from failed RAID

My situation is different from this one.

I have a CentOS system with 3 hard drives, and the following software RAID arrays:

/boot on RAID 1 over 2 disks
/ on RAID 5 over 3 disks
swap on RAID 0 over 2 disks (I believe)

My 3rd drive failed. At the beginning, no big deal, the array was still working. But after 1 day, when I got ready to swap the bad disk, the system cannot boot anymore with the new disk in:

md: md2: raid array is not clean -- starting background reconstruction
raid5: cannot start dirty degraded array for md2
raid5: failed to run raid set md2
[...]
Kernel panic

It stops there. I have no shell. I've tried to but on the Rescue disk, but I don't know how to go from there: my arrays are not seen, so I cannot rebuild them. Exact same issue if I boot with 2 disks, or with the bad disk as my 3rd drive.

How can I fix the array now that I have a new drive?


Somehow you've managed to stop the array in a dirty state (which means that the RAID system can't be sure that the parity on all the disks is OK). This can happen if the machine was abruptly powered off, or some other write hole-inducing event.

I suspect that reassembling the array by hand, from a rescue CD, using the --force option might work, like so:

mdadm --assemble --force /dev/md2 /dev/sda2 /dev/sdb2 missing

(replacing /dev/sd... with the existing devices that make up your RAID-5 array). Assuming that that works and /proc/mdstat shows the array assembled (in a degraded state), then you can add the new partition, like so:

mdadm /dev/md2 --add /dev/sdc2

If the initial forced assemble doesn't do the trick, then you're deeply up the creek. A couple of minutes with Google has found http://www.linuxforums.org/forum/servers/77867-eeek-cant-assemble-degraded-dirty-raid6-array.html which seems to deal with a similar problem, so it might be worth trying what is described as working in there (echo "clean" > /sys/block/md0/md/array_state) but that's a slightly uglier way of doing things.

Regardless of how you manage to get the RAID set back together, the fact that it is dirty and degraded means that the contents really can't be trusted any more. The filesystem could have metadata corruption (which a fsck should fix), or the contents of one or more files could be corrupted (which you won't know without verifying the contents of all files on the partition).


The System Rescue CD has the mdadm tools, so if you know how to use them it should be useful to you.