How to get an inactive RAID device working again?

Solution 1:

For your bonus question:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Solution 2:

I have found that I have to add the array manually in /etc/mdadm/mdadm.conf in order to make Linux mount it on reboot. Otherwise I get exactly what you have here - md_d1-devices that are inactive etc.

The conf-file should look like below - i.e. one ARRAY-line for each md-device. In my case the new arrays were missing in this file, but if you have them listed this is probably not a fix to your problem.

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

Add one array per md-device, and add them after the comment included above, or if no such comment exists, at the end of the file. You get the UUIDs by doing sudo mdadm -E --scan:

$ sudo mdadm -E --scan
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

As you can see you can pretty much just copy the output from the scan-result into the file.

I run ubuntu desktop 10.04 LTS, and as far as I remember this behavior differs from the server version of Ubuntu, however it was such a long time ago I created my md-devices on the server I may be wrong. It may also be that I just missed some option.

Anyway, adding the array in the conf-file seems to do the trick. I've run the above raid 1 and raid 5 for years with no problems.

Solution 3:

Warning: First of all let me say that the below (due to the use of "--force") seems risky to me, and if you have irrecoverable data I'd recommend making copies of the partitions involved before you start trying any of the things below. However, this worked for me.

I had the same problem, with an array showing up as inactive, and nothing I did including the "mdadm --examine --scan >/etc/mdadm.conf", as suggested by others here, helped at all.

In my case, when it tried to start the RAID-5 array after a drive replacement, it was saying that it was dirty (via dmesg):

md/raid:md2: not clean -- starting background reconstruction
md/raid:md2: device sda4 operational as raid disk 0
md/raid:md2: device sdd4 operational as raid disk 3
md/raid:md2: device sdc4 operational as raid disk 2
md/raid:md2: device sde4 operational as raid disk 4
md/raid:md2: allocated 5334kB
md/raid:md2: cannot start dirty degraded array.

Causing it to show up as inactive in /proc/mdstat:

md2 : inactive sda4[0] sdd4[3] sdc4[2] sde4[5]
      3888504544 blocks super 1.2

I did find that all the devices had the same events on them, except for the drive I had replaced (/dev/sdb4):

[root@nfs1 sr]# mdadm -E /dev/sd*4 | grep Event
mdadm: No md superblock detected on /dev/sdb4.
         Events : 8448
         Events : 8448
         Events : 8448
         Events : 8448

However, the array details showed that it had 4 out of 5 devices available:

[root@nfs1 sr]# mdadm --detail /dev/md2
/dev/md2:
[...]
   Raid Devices : 5
  Total Devices : 4
[...]
 Active Devices : 4
Working Devices : 4
[...]
    Number   Major   Minor   RaidDevice State
       0       8        4        0      inactive dirty  /dev/sda4
       2       8       36        2      inactive dirty  /dev/sdc4
       3       8       52        3      inactive dirty  /dev/sdd4
       5       8       68        4      inactive dirty  /dev/sde4

(The above is from memory on the "State" column, I can't find it in my scroll-back buffer).

I was able to resolve this by stopping the array and then re-assembling it:

mdadm --stop /dev/md2
mdadm -A --force /dev/md2 /dev/sd[acde]4

At that point the array was up, running with 4 of the 5 devices, and I was able to add the replacement device and it's rebuilding. I'm able to access the file-system without any problem.

Solution 4:

I was having issues with Ubuntu 10.04 where an error in FStab prevented the server from booting.

I ran this command as mentioned in the above solutions:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

This will append the results from "mdadm --examine --scan" to "/etc/mdadm/mdadm.conf"

In my case, this was:

ARRAY /dev/md/0 metadata=1.2 UUID=2660925e:6d2c43a7:4b95519e:b6d110e7 name=localhost:0

This is a fakeraid 0. My command in /etc/fstab for automatically mounting is:

/dev/md0 /home/shared/BigDrive ext3 defaults,nobootwait,nofail 0 0

The important thing here is that you have "nobootwait" and "nofail". Nobootwait will skip any system messages which are preventing you from booting. In my case, this was on a remote server so it was essential.

Hope this will help some people.

Solution 5:

A simple way to get the array to run assuming there is no hardware problem and you have enough drives/partitions to start the array is the following:

md20 : inactive sdf1[2](S)
      732442488 blocks super 1.2

 sudo mdadm --manage /dev/md20  --run

It could be that for whatever reason the array is fine but something prevented it from starting or building. In my case this was because mdadm didn't know the original array name was md127 and all drives were unplugged for that array. When replugging I had to manually assemble (probably a bug where mdadm thought the array was already active because of the offline old array name).