Odd mdadm output: --examine shows array state failed, --detail shows everything clean
The setup: 8 disks in a mdadm-managed RAID5 array (/dev/md0, made from /dev/sdc through /dev/sdj). One disk (/dev/sdh) is showing SMART errors (increasing pending sector count) so I'm looking to replace it. Additionally, the machine boots from a Revodrive SSD in a PCIe slot that's configured with a RAID0 stripe.
The oddness: mdadm --detail output shows the array as clean, and everything looks to be running well (I can mount, read, write the array without problems). mdadm --examine output for every disk shows an array state of failed.
root@saturn:/backup# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdi1[6] sdj1[8] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
20511854272 blocks super 1.0 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
unused devices: <none>
The proc table only shows the mdadm managed array of SATA drives, not the revodrive, which I'd expect as the revodrive RAID should be managed by its own hardware controller.
root@saturn:/backup# mdadm --detail /dev/md0
mdadm: metadata format 01.00 unknown, ignored.
/dev/md0:
Version : 01.00
Creation Time : Wed Apr 20 10:14:05 2011
Raid Level : raid5
Array Size : 20511854272 (19561.63 GiB 21004.14 GB)
Used Dev Size : 5860529792 (5589.04 GiB 6001.18 GB)
Raid Devices : 8
Total Devices : 8
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Sep 19 13:42:21 2011
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : saturn:0 (local to host saturn)
UUID : e535a44b:b319927e:4a574c20:39fc3f08
Events : 45
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
4 8 97 4 active sync /dev/sdg1
5 8 113 5 active sync /dev/sdh1
6 8 129 6 active sync /dev/sdi1
8 8 145 7 active sync /dev/sdj1
Obviously, there's a metadata format error in the first line, from an auto-generated metadata flag in mdadm.conf, but this is mdadm v2.6.7.1 running on Ubuntu, and I've chalked it down to a known issue
root@saturn:/backup# mdadm --examine /dev/sdc1
mdadm: metadata format 01.00 unknown, ignored.
/dev/sdc1:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x0
Array UUID : e535a44b:b319927e:4a574c20:39fc3f08
Name : saturn:0 (local to host saturn)
Creation Time : Wed Apr 20 10:14:05 2011
Raid Level : raid5
Raid Devices : 8
Avail Dev Size : 5860529904 (2794.52 GiB 3000.59 GB)
Array Size : 41023708544 (19561.63 GiB 21004.14 GB)
Used Dev Size : 5860529792 (2794.52 GiB 3000.59 GB)
Super Offset : 5860530160 sectors
State : clean
Device UUID : 1b508410:b129e871:d92c7979:30764611
Update Time : Mon Sep 19 13:52:58 2011
Checksum : 2e68592 - correct
Events : 45
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 0 (0, 1, 2, 3, 4, 5, 6, failed, 7)
Array State : Uuuuuuuu 1 failed
But in the --examine output, the Array state is failed. Each disk seems to show itself as the failed member - /dev/sdd shows uUuuuuuu, /dev/sde shows uuUuuuuu, etc - but all show the mystery 9th "failed" slot between slots 6 and 7 on the previous line.
I'm guessing the disk superblocks are screwy, despite everything being functional. I'd like to get this fixed before proceeding with the replacement of the suspect disk, as I'm a little concerned about how the disks might behave if I failed a drive. What's the best way for me to proceed?
You need to update mdadm to at least version v3.1.1. This bug describes the problem that you were having and how updating mdadm shows that the new superblock format is now correctly interpreted.