Grub rescue, will not boot from mdadm RAID, no such disk or device -- mduuid wrong?
I am running a 14 disk RAID 6 on mdadm behind 2 LSI SAS2008's in JBOD mode (no HW raid) on Debian 7 in BIOS legacy mode.
Grub2 is dropping to a rescue shell complaining that "no such device" exists for "mduuid/b1c40379914e5d18dddb893b4dc5a28f".
Output from mdadm:
# mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Nov 7 17:06:02 2012
Raid Level : raid6
Array Size : 35160446976 (33531.62 GiB 36004.30 GB)
Used Dev Size : 2930037248 (2794.30 GiB 3000.36 GB)
Raid Devices : 14
Total Devices : 14
Persistence : Superblock is persistent
Update Time : Thu Sep 18 19:44:56 2014
State : clean
Active Devices : 14
Working Devices : 14
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : media:0 (local to host media)
UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
Events : 2319862
Number Major Minor RaidDevice State
13 8 82 0 active sync /dev/sdf2
15 8 130 1 active sync /dev/sdi2
14 8 98 2 active sync /dev/sdg2
21 8 194 3 active sync /dev/sdm2
16 8 226 4 active sync /dev/sdo2
12 8 162 5 active sync /dev/sdk2
18 8 50 6 active sync /dev/sdd2
17 8 146 7 active sync /dev/sdj2
20 8 210 8 active sync /dev/sdn2
19 8 66 9 active sync /dev/sde2
11 8 34 10 active sync /dev/sdc2
24 8 178 11 active sync /dev/sdl2
23 8 114 12 active sync /dev/sdh2
22 8 18 13 active sync /dev/sdb2
Output from blkid:
# blkid
/dev/md0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
/dev/md/0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
/dev/sdd2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="09a00673-c9c1-dc15-b792-f0226016a8a6" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdc2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="ce717500-cadf-3b12-e893-48d43c1408e7" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdf2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="071afb12-f78f-4f15-f65a-a6298eadcfa7" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdb2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="822fd02b-454d-a94c-57f6-8535964996b1" LABEL="media:0" TYPE="linux_raid_member"
/dev/sde2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="de3f41b8-3016-870c-344f-2a92c08e1085" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdg2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="e319bdaa-22bc-1153-c43b-48788a9c1832" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdi2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="3dd1df1b-203c-6453-0964-ebad245b1670" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdh2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="f5477580-9435-7948-6e97-fe82c8805bcd" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdj2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="4a013330-37c5-65f9-cb76-1d357ce4ddb4" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdm2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="b750b4e4-2b1b-ac5f-cbd3-bde5eab657e7" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdk2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="d5521994-6c4f-04f9-f7ca-0dd9dff3c6cd" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdn2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="4670b36c-07cb-e661-20e3-d314f7c3fd42" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdl2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="c1514b9f-2461-6fed-324a-50fb9469043a" LABEL="media:0" TYPE="linux_raid_member"
/dev/sdo2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="6c33c472-af1f-fd8f-22d1-0ea39edc75bb" LABEL="media:0" TYPE="linux_raid_member"
The UUID for md0 is 2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb
so I do not understand why grub insists on looking for b1c40379914e5d18dddb893b4dc5a28f
.
Here is the output from bootinfoscript
0.61. This contains alot of detailed information, and I couldn't find anything wrong with any of it:
http://pastebin.com/bPgGN68L
During the grub rescue an ls
shows the member disks and also shows (md/0)
but if I try an ls (md/0)
I get an unknown disk error. Trying an ls
on any member device results in unknown filesystem. The filesystem on the md0 is XFS, and I assume the unknown filesystem is normal if its trying to read an individual disk instead of md0.
I have come close to losing my mind over this, I've tried uninstalling and reinstalling grub numerous times, update-initramfs -u -k all
numerous times, update-grub
numerous times, grub-install
numerous times to all member disks without error, etc.
I even tried manually editing grub.cfg
to replace all instances of mduuid/b1c40379914e5d18dddb893b4dc5a28f
with (md/0)
and then re-install grub, but the exact same error of no such device mduuid/b1c40379914e5d18dddb893b4dc5a28f still happened.
EDIT TO ADD
I don't have IPMI on this box so please forgive the embarrassing cell phone phone picture:
http://imgur.com/zooX12b
One thing I noticed is it is only showing half the disks. I am not sure if this matters or is important or not, but one theory would be because there are two LSI cards physically in the machine.
This last screenshot was shown after I specifically altered grub.cfg to replace all instances of mduuid/b1c40379914e5d18dddb893b4dc5a28f
with mduuid/2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb
and then re-ran grub-install on all member drives. Where it is getting this old b1c* address I have no clue.
I even tried installing a SATA drive on /dev/sda, outside of the array, and installing grub on it and booting from it. Still, same identical error.
EDIT TO CLARIFY
Grub installation is to each individual member disk, not to /dev/md0, and completes without error. But drops to grub rescue on reboot.
EDIT TO ADD
These operations were suggested by a friend. They did not work, I still need help!
I could really use some assistance from anyone/everyone to help me get GRUB working on this box.
Anyone have other suggestions and fixes?
EDIT 5
Grub bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764798
Look at /dev/disk/by-id with the raid device prefixed by md-uuid. Those are the correct id's for using mduuid/ in grub. Probably need to insmod mdraid1x too if you are using current metadata.