mdadm - Remove disk from RAID0

First of all: to those, who still believes in "RAID0 has no hot spare". It could have a manual spare, done by human, who understand RAID levels and mdadm. mdadm is software RAID, so it could do a lot of interesting things.

Credits to Zoredache for the idea!

So, the situation:

  • you have RAID0 array of two disks
  • you would like to replace one of them without array downtime

If the downtime is acceptable, you always can just make a block copy of disk with dd and reassemble the array, mdadm will do OK.

Solution: use RAID4 as intermediate solution

RAID0 -> RAID4 -> RAID0

So, if you don't remember RAID4, it is simple. It has a parity block, but unlike RAID5 it is not distributed across the array, but resides on ONE disk. That's the point, this is important and this is the reason RAID5 will not work.

What you'll need: two more disks of the same size, as the disk you would like to replace.

Environment:

  • Ubuntu 14.04 Thrusty Thar
  • mdadm - v3.2.5 - 18th May 2012
  • /dev/sdb - start with it, will replace it
  • /dev/sdc - start with it
  • /dev/sdd - will be used temporary
  • /dev/sde - will be used instead of sdb

The ultimate RAID0 hot-spare mdadm guide ;)

sudo mdadm -C /dev/md0 -l 0 -n 2 /dev/sd[bc]

md0 : active raid0 sdc[1] sdb[0]
      2096128 blocks super 1.2 512k chunks

We've created raid0 array, it looks sweet.

sudo md5sum /dev/md0

b422ba644a3c83cdf28adfa94cb658f3  /dev/md0

This is our check point - if even one bit will differ in resulting /dev/md0 - we've failed.

sudo mdadm /dev/md0 --grow --level=4

md0 : active raid4 sdc[1] sdb[0]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]

So, we've grown our array to be RAID4. We haven't added the parity disk yet, so let's do it. The grow will be instant - there is nothing to recompute or recalculate.

sudo mdadm /dev/md0 -a /dev/sdd

md0 : active raid4 sdd[3] sdc[1] sdb[0]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
      [===>.................]  recovery = 19.7% (207784/1048064) finish=0.2min speed=51946K/sec

We've added sdd as parity disk. This is important to remember - the order of disks in the first row is not syncronized with the picture in second row! [UU_]

sdd is displayed first, but in fact it is last one, and holds not the data, but the parity.

sudo mdadm /dev/md0 -f /dev/sdb

md0 : active raid4 sdd[3] sdc[1] sdb[0](F)
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]

We've made our disk sdb faulty, to remove it in the next steps.

sudo mdadm --detail /dev/md0

State : clean, degraded

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       32        1      active sync   /dev/sdc
       3       8       48        2      active sync   /dev/sdd

       0       8       16        -      faulty spare   /dev/sdb

Details show us the removal of the first disk and here we can see the true order of the disks in the array. It's important to track the disk with parity, we should not leave it in the array when going back to RAID0.

sudo mdadm /dev/md0 -r /dev/sdb

md0 : active raid4 sdd[3] sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]

sdb is completely removed, could be taken away.

sudo mdadm /dev/md0 -a /dev/sde

md0 : active raid4 sde[4] sdd[3] sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [_UU]
      [==>..................]  recovery = 14.8% (156648/1048064) finish=0.2min speed=52216K/sec

We have added the replacement for our sdb disk. And here we go: now the data of sdb is being recovered using parity. Sweeeeet.

md0 : active raid4 sde[4] sdd[3] sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/3] [UUU]

Done. Right now we are completely safe - all data from sdb are recovered, and now we have to remove sdd (remember, it holds parity).

sudo mdadm /dev/md0 -f /dev/sdd

md0 : active raid4 sde[4] sdd[3](F) sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]

Made sdd faulty.

sudo mdadm /dev/md0 -r /dev/sdd

md0 : active raid4 sde[4] sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]

Removed sdd from our array. We are ready to become RAID0 again.

sudo mdadm /dev/md0 --grow --level=0 --backup-file=backup

md0 : active raid4 sde[4] sdc[1]
      2096128 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/2] [UU_]
      [=>...................]  reshape =  7.0% (73728/1048064) finish=1.5min speed=10532K/sec

Aaaaaaand bang!

md0 : active raid0 sde[4] sdc[1]
      2096128 blocks super 1.2 512k chunks

Done. Let's look at md5 checksum.

sudo md5sum /dev/md0

b422ba644a3c83cdf28adfa94cb658f3  /dev/md0

Any more questions? So RAID0 could have a hot spare. It's called "user" ;)


As far as I know once you set up a RAID0 you cannot change one of the disks. You can take a backup and switch the disks and restore the backup. I would just RAID5 those 3 disks you have. That way in the future you can drop a disk and still rebuild it.