What steps should I take to best attempt to recover a failed software raid5 setup?
My raid has failed, and I'm not sure what the best steps to take are in order to best attempt to recover it.
I've got 4 drives in a raid5 configuration. It seems as if one has failed (sde1
), but md
can't bring the array up because it says sdd1
is not fresh
Is there anything I can do to recover the array?
I've pasted below some excerpts from /var/log/messages
and mdadm --examine
:
/var/log/messages
$ egrep -w sd[b,c,d,e]\|raid\|md /var/log/messages
nas kernel: [...] sd 5:0:0:0: [sde]
nas kernel: [...] sd 5:0:0:0: [sde] CDB:
nas kernel: [...] end_request: I/O error, dev sde, sector 937821218
nas kernel: [...] sd 5:0:0:0: [sde] killing request
nas kernel: [...] md/raid:md0: read error not correctable (sector 937821184 on sde1).
nas kernel: [...] md/raid:md0: Disk failure on sde1, disabling device.
nas kernel: [...] md/raid:md0: Operation continuing on 2 devices.
nas kernel: [...] md/raid:md0: read error not correctable (sector 937821256 on sde1).
nas kernel: [...] sd 5:0:0:0: [sde] Unhandled error code
nas kernel: [...] sd 5:0:0:0: [sde]
nas kernel: [...] sd 5:0:0:0: [sde] CDB:
nas kernel: [...] end_request: I/O error, dev sde, sector 937820194
nas kernel: [...] sd 5:0:0:0: [sde] Synchronizing SCSI cache
nas kernel: [...] sd 5:0:0:0: [sde]
nas kernel: [...] sd 5:0:0:0: [sde] Stopping disk
nas kernel: [...] sd 5:0:0:0: [sde] START_STOP FAILED
nas kernel: [...] sd 5:0:0:0: [sde]
nas kernel: [...] md: unbind<sde1>
nas kernel: [...] md: export_rdev(sde1)
nas kernel: [...] md: bind<sdd1>
nas kernel: [...] md: bind<sdc1>
nas kernel: [...] md: bind<sdb1>
nas kernel: [...] md: bind<sde1>
nas kernel: [...] md: kicking non-fresh sde1 from array!
nas kernel: [...] md: unbind<sde1>
nas kernel: [...] md: export_rdev(sde1)
nas kernel: [...] md: kicking non-fresh sdd1 from array!
nas kernel: [...] md: unbind<sdd1>
nas kernel: [...] md: export_rdev(sdd1)
nas kernel: [...] md: raid6 personality registered for level 6
nas kernel: [...] md: raid5 personality registered for level 5
nas kernel: [...] md: raid4 personality registered for level 4
nas kernel: [...] md/raid:md0: device sdb1 operational as raid disk 2
nas kernel: [...] md/raid:md0: device sdc1 operational as raid disk 0
nas kernel: [...] md/raid:md0: allocated 4338kB
nas kernel: [...] md/raid:md0: not enough operational devices (2/4 failed)
nas kernel: [...] md/raid:md0: failed to run raid set.
nas kernel: [...] md: pers->run() failed ...
mdadm --examine
$ mdadm --examine /dev/sd[bcdefghijklmn]1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 4dc53f9d:f0c55279:a9cb9592:a59607c9
Name : NAS:0
Creation Time : Sun Sep 11 02:37:59 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e8369dbc:bf591efa:f0ccc359:9d164ec8
Update Time : Tue May 27 18:54:37 2014
Checksum : a17a88c0 - correct
Events : 1026050
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.A. ('A' == active, '.' == missing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 4dc53f9d:f0c55279:a9cb9592:a59607c9
Name : NAS:0
Creation Time : Sun Sep 11 02:37:59 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 78221e11:02acc1c8:c4eb01bf:f0852cbe
Update Time : Tue May 27 18:54:37 2014
Checksum : 1fbb54b8 - correct
Events : 1026050
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : A.A. ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 4dc53f9d:f0c55279:a9cb9592:a59607c9
Name : NAS:0
Creation Time : Sun Sep 11 02:37:59 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : fd282483:d2647838:f6b9897e:c216616c
Update Time : Mon Oct 7 19:21:22 2013
Checksum : 6df566b8 - correct
Events : 32621
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 4dc53f9d:f0c55279:a9cb9592:a59607c9
Name : NAS:0
Creation Time : Sun Sep 11 02:37:59 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e84657dd:0882a7c8:5918b191:2fc3da02
Update Time : Tue May 27 18:46:12 2014
Checksum : 33ab6fe - correct
Events : 1026039
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA. ('A' == active, '.' == missing)
Solution 1:
You've had a double drive failure, with one of the drives being dead for six months. With RAID5, this is irrecoverable. Replace the failed hardware and restore from backup.
Going forward, consider RAID6 with large drives like this and make sure you have monitoring in place to catch device failures so you can respond to them ASAP.