Removing a device in "removed" state from Linux software RAID array
My workstation has two disks(/dev/sd[ab]), both with similar partitioning. /dev/sdb failed, and cat /proc/mdstat
stopped showing the second sdb partition.
I ran mdadm --fail
and mdadm --remove
for all partitions from the failed disk on the arrays that use them, although all such commands failed with
mdadm: set device faulty failed for /dev/sdb2: No such device
mdadm: hot remove failed for /dev/sdb2: No such device or address
Then I hot swapped the failed disk, partitioned the new disk and added the partitions to the respective arrays. All arrays got rebuilt properly except one, because in /dev/md2, the failed disk doesn't seem to have been removed from the array properly. Because of this, the new partition keeps getting added as a spare to the partition, and its status remains degraded.
Here's what mdadm --detail /dev/md2
shows:
[root@ldmohanr ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 1.1
Creation Time : Tue Dec 27 22:55:14 2011
Raid Level : raid1
Array Size : 52427708 (50.00 GiB 53.69 GB)
Used Dev Size : 52427708 (50.00 GiB 53.69 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Nov 23 14:59:56 2012
State : active, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Name : ldmohanr.net:2 (local to host ldmohanr.net)
UUID : 4483f95d:e485207a:b43c9af2:c37c6df1
Events : 5912611
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 0 0 1 removed
2 8 18 - spare /dev/sdb2
To remove a disk, mdadm needs a device filename, which was /dev/sdb2 originally, but that no longer refers to device number 1. I need help with removing device number 1 with 'removed' status and making /dev/sdb2 active.
If the drive is no longer showing up in the system, do this:
mdadm /dev/md2 -r detached
or
mdadm /dev/md2 -r failed
If done successfully, you should get a message like:
mdadm: hot removed 8:50 from /dev/md0
And the drive no longer shows up in /proc/mdstat. From the man page:
"The first causes all failed device to be removed. The second causes any device which is no longer connected to the system (i.e an 'open' returns ENXIO) to be removed. This will only succeed for devices that are spares or have already been marked as failed."