MDADM Superblock Recovery

Solution 1:

Yikes! What a pickle. let's see if we can get you sorted. Starting with a recap of your disks and partition tables:

sda - no partition table
sdb - sdb1 [Linux] sdb2 [Linux extended] sdb5 [swap]
sdc - no partition table
sdd - no partition table
sde - no partition table
  1. None of these are marked fd Linux raid autodetect, which is the default
  2. You're not using partitions to organize your disk space [0]
  3. You appear to have the entire disk formatted for ext2/3 and are using the entire disk as part of the raidset

The last point is where I think you became undone. The initscripts probably thought you were due for an fsck, sanity checked the volumes, and wiped out the MD superblock in the process. dumpe2fs should return nothing for volumes part of the RAID set.

Take my RAID for example:

root@mark21:/tmp/etc/udev# fdisk -l /dev/sda

Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000ffc4

Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1240233983   620115968   fd  Linux raid autodetect

root@mark21:/tmp/etc/udev# dumpe2fs /dev/sda1
dumpe2fs 1.41.14 (22-Dec-2010)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sda
Couldn't find valid filesystem superblock.

That you were able to recreate the RAID set at all is extremely lucky, but that doesn't change the fundamental flaws in your deployment. This will happen again.

What I would recommend is:

  1. Backup everything on that raid set
  2. Destroy the array and erase the md superblock from each device (man mdadm)
  3. Zero out those disks: dd if=/dev/zero of=/dev/sdX bs=1M count=100
  4. Create partitions on sda, sdc, sdd, & sdf that span 99% of the disk [0]
  5. Tag those partitions as type fd linux-raid wiki
  6. never ever format these partitions with any sort of filesystem
  7. Create a new RAID 5: mdadm --create /dev/md0 -v -f -l 5 -n 4 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
  8. Update new UUID in /etc/mdadm.conf
  9. Live happily ever after

I presume from your description that sdb is your system disk, and that's fine. Just make sure you don't accidentally include that in your raid set creation. After this, you should be on the right track and will never encounter this problem again.

[0] I encountered a very nasty fault once on SATA disks that had lots of bad blocks. After using the vendor tool to reconstitute the disk. My once identical set of disks was now unique, the bad drive now had a few blocks less than before the low level format had begun, which of course ruined my partition table and prevented the drive from rejoined the MD RAID set.

Hard drives usually have a "free list" of backup blocks used for just an occasion. My theory is that that list must have been exhausted, and since this wasn't an enterprise disk, instead of failing safe and allowing me the opportunity to send it off for data recovery, it decided to truncate my data and re-size the entire disk in.

Therefore, I never use the entire disk anymore when creating a RAID set, and instead use anywhere from 95-99% of the available free space when creating a partition that would normally span the entire disk. This also gives you some additional flexibility when replacing failed members. For example, not all 250 GB disks have the same amount of free blocks, so if you undershoot the max by a comfortable margin, then you can use almost any disk brand to replace a failed member.

Solution 2:

I've had the same issue before, and I didn't document it (and was a while ago).

I recall something about using e2fsck -b <superblockSector> /dev/sdX and trying backup superblock sectors

you could also take a look at TestDisk