mdadm raid 1 grub only on sda

I just finished setting up a CentOS 6.3 64bit server with mdadm however a lightbulb went on and I realised grub would only be installed on the first drive which is about as much use and an ashtray on a motorbike.

I had a look to confirm my suspicion:

grub> find /grub/stage1
find /grub/stage1
 (hd0,0)

So I updated my device map to look like:

(fd0)   /dev/fd0
(hd0)   /dev/sda
(hd1)   /dev/sdb

(Note the (hd1) entry was added by me

so then I tried to install grub on /dev/sdb

And I get:

grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... no

Error 15t: File not found

So I did some googling (Sadly google has just done a good job and picked up 100's of grub install examples which dont help here)

After finding a few clues I tried:

# grub-install --recheck /dev/sdb

Probing devices to guess BIOS drives. This may take a long time.
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0)   /dev/fd0
(hd0)   /dev/sda
(hd1)   /dev/sdb

# grub-install /dev/sdb
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0)   /dev/fd0
(hd0)   /dev/sda
(hd1)   /dev/sdb

Which sort of suggests grub is now installed on /dev/sdb too, however if I take another look I still get:

grub> find /grub/stage1
find /grub/stage1
 (hd0,0)

parted outputs for the 2 drives:

SDA

Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      17.4kB  500MB   500MB   ext3         1     boot
 2      500MB   81.0GB  80.5GB               2     raid
 3      81.0GB  85.0GB  4000MB               3     raid
 4      85.0GB  3001GB  2916GB               4     raid

SDB

Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      17.4kB  500MB   500MB   ext3         1
 2      500MB   81.0GB  80.5GB               2     raid
 3      81.0GB  85.0GB  4000MB               3     raid
 4      85.0GB  3001GB  2916GB               4     raid

And mdadm mdstat:

Personalities : [raid1]
md1 : active raid1 sdb3[1] sda3[0]
      3905218 blocks super 1.1 [2/2] [UU]

md2 : active raid1 sdb4[1] sda4[0]
      2847257598 blocks super 1.1 [2/2] [UU]

md0 : active raid1 sda2[0] sdb2[1]
      78612189 blocks super 1.1 [2/2] [UU]

Is anyone able to throw some light on the situation, it feels like I am 99% there at the moment and missing something obvious.

Thanks.

edit update:

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0               74G   18G   53G  25% /
tmpfs                 580M     0  580M   0% /dev/shm
/dev/sda1             462M   98M  341M  23% /boot
xenstore              580M   64K  580M   1% /var/lib/xenstored

/ is on md0 which is made up of sda2 and sdb2 swap is md1 which is sda3 and sdb3 md2 is LVM however /boot is only on /sda1

I suppose that is the problem, would the resolution be to create md4 and have it contain sda1 and sdb1

Perhaps I have things mixed up a little in my head but I assumed grub was not installed on a partition but the first few blocks of the drive i.e. sda or hd0/1

Any clarification and advice is appreciated.


Solution 1:

This should be your problem

root (hd1,0)
 Filesystem type is ext2fs, partition type 0x83

Take the following steps:

  • Create the 2 /boot partitions on /dev/sda1 and /dev/sdb1 - type fd(Linux autodetect raid) - use your favorite tool(fdisk, cfdisk, gparted,...) (fd00 for GPT)
  • Remember to turn on the bootable flag on both partitions, sda1 and sdb1 (not for GPT)
  • Force the disks to be a brand new raid:

    mdadm --zero-superblock /dev/sda1 
    mdadm --zero-superblock /dev/sdb1
    
  • While creating the raid metadata that will be your /boot partition, use the version 0.9. Linux cannot autodetect newer versions (without a ramdisk).

    mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/sda1 /dev/sdb1 --metadata=0.9
    
  • Format using ext2 or ext3

  • Install your Linux of choice, WITHOUT formating the /boot

After your distro first boot:

  • Fix your /etc/fstab to point /boot to /dev/md0(maybe it will not be necessary)
  • Install grub on the 2 disks MBR

    # grub /dev/sda
     grub> root (hd0,0)
     grub> setup (hd0)
     grub> quit
     quit
    
    # grub /dev/sdb
     grub> root (hd1,0)
     grub> setup (hd1)
     grub> quit
     quit
    
  • Edit your bootloader(instructions to Grub1)

  • Search the "default" line and add the "fallback" option bellow

    vi /boot/grub/menu.lst
    default 0
    fallback 1
    
  • Add another entry to your bootloader(again, in my case i've choosen grub1 since its less complicated and it's good enough to my needs), one of each pointing to the different boot partitions that are members of the raid:

    title           Debian GNU/Linux, kernel 2.6.32-5-686  (default)
    root            (hd0,0)
    kernel          /vmlinuz-2.6.32-5-686 root=/dev/mapper/vg-root ro quiet
    initrd          /initrd.img-2.6.32-5-686
    
    title           Debian GNU/Linux, kernel 2.6.32-5-686  (fallback)
    root            (hd1,0)
    kernel          /vmlinuz-2.6.32-5-686 root=/dev/mapper/vg-root ro quiet
    initrd          /initrd.img-2.6.32-5-686 
    
  • Note that in my case, i have a LVM layer on my / md raid.

Done. This should be enough to you to have a "redundant" bootloader.

Solution 2:

Whether you want to map those two drives onto another raid1 array is up to you. It's a legitimate choice because you would then be able to install grub to md4 directly and avoid the hustle of managing both installations manually. I've done it™.

In my experience, the device map file is largely irrelevant and its behavior (how grub reads it) is at best unpredictable, if not outright arbitrary.

The device command from the grub shell is a lot more reliable. You can read the help info on it, but the basic syntax speaks for itself:

grub> device (hd0) /dev/md4

After that, /dev/md4 will be mapped to hd0 in the currently running grub session, disregarding the device map file. From here, one would proceed as usual with root (hd0) and setup (hd0,x). The reason for installing to a partition is explained by Henry S.

AFAIK, the only thing apart from the "boot code" (stage1) that is written to the first 512 bytes of the drive (MBR) is the number of partition where that stage should look for the menu.

It's possible to mess that up. One would be greeted with a prompt instead of a menu after POST, but grub provides commands for "initializing" that menu from a different partition (file) than the once specified under installation. It generally works out without intervention though, because the "first boot drive" in bios will be detected as "hd0".