Properly Boot a Software-Based RAID1 with a Missing or Improperly Failed Drive
tl;dr. Is there a way to properly boot a software-based RAID1 with a missing or failed drive (that wasn't failed by the user first)?
To be clear, booting a software-based RAID1 without a hard drive is possible IF you properly fail the drive before rebooting. I know this is subjective, but this doesn’t seem like a plausible solution nor an acceptable answer. For example; A facility takes a power hit and the hard drive fails at the same time the power goes out. Trying to boot with a degraded hard drive that wasn’t “properly” failed will result the system dropping into emergency mode.
I’ve read many posts from across here and other forums all recommending that you install grub on all partitions, or rebuild grub manually, add nofail
to the /etc/fstab
options, or other seemingly simple solutions; but the reality is that none of these recommendations have worked.
While I’ve come to terms with this not being possible, something about this doesn’t rest easy. So, I’m seeing if anyone else has this problem or has a solution to this issue.
My environment:
I have an older motherboard that doesn't support UEFI, so I have booting legacy mode/MBR.
OS:
cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 7.6 (Maipo)
Kernel:
uname –r
3.10.0-957.el7.x86_64
mdadm:
mdadm –version
mdadm – v4.1-rc1 2018-03-22
My RAID is RAID1 across three drives. (sda,sdb,sdc
) and there are 4 partitions
md1 - /boot
md2 - /home
md3 - /
md4 - swap
I have installed grub on all partitions and ensured that all boot partitions have the boot flag.
fdisk /dev/sd[a,b,c]
all show a *
in the boot field next to the appropriate partition
-- and --grub2-install /dev/sd[a,b,c]
(as separate commands, with ‘successfully installed’ results).
Replicating the problem:
- Power off the system with all drives assigned to the RAID and the RAID fully operational.
- Remove hard drive
- Power system up
Results:
The system will boot past grub. Gdm will attempt to display the login screen but after about 20 seconds, it will fail and drop to an emergency console. There are many missing parts from a “normal” system. For instance; /boot and /etc do not exist. There doesn't appear to be any kernel panic messages or issues displayed in dmesg
.
Again, the key here is; the RAID has to be fully assembled, power down and remove a drive. If you properly fail a drive and remove it from the RAID, then you can boot without a drive present.
Example:mdadm --manage /dev/md[1,2,3,4] --fail /dev/sda[1,2,3,4]
(as separate commands)mdadm --manage /dev/md[1,2,3,4] --remove /dev/sda[1,2,3,4]
(as separate commands)
I know this seems trivial, but I have yet to find a viable solution to booting a system with a degraded RAID1. You would think that this should be a simple problem with a simple solution, but this does not appear to be the case.
Any help, input, or suggestions would be greatly appreciated.
Booting up a failed MD RAID1 array is surely possible - at least if the BIOS skips the failed disk (if not, you can simply manually boot from the surviging disk).
For your specific issue, you are probably hitting this bug. An excerpt (but reading all the bug report would be a good idea):
RHEL 7.6 dracut-iniqueue script has a default value of 180 seconds (as defined in the RDRETRY variable), which is higher than systemd root mount service (90 seconds). This can lead to unbootable system when root resides on a degraded software RAID1 device (user is dropped to emergency shell). See https://bugzilla.redhat.com/show_bug.cgi?id=1451660# for an example of the problem. Note that this only happen when the RAID device expects itself to be healthy, but it unexpectedly found the array degraded during boot.
Passing "rd.retry=30" at boot time fixes the degraded array boot problem, as the array is forced started before the systemctl root mount service times out. Moreover, the long dracut rd.retry timeout is inconsistent with dracut.cmdline(7) man page, where it is stated the timeout should be 30 seconds.
...
Additional info: I traced the problem to how mdadm --incremental, dracut timeout (rd.retry) and systemctl default timeout interact:
- mdadm --incremental will not start/run an array which is unexpectedly found degraded;
- dracut should force-start the array after 2/3 of the timeout value passed. With current RHEL default, this amount to 180/3*2 = 120s;
- systemctl expect to mount the root filesystem in at most 90s. If it does not succeed, it abort the dracut script and drop to an emergency shell. Being 90s lower than dracut timeout, it means that dracut does not have a chance to force-start the array. Lowering rd.retry timeout (setting as the man page suggests) enables dracut to force-start the array, allowing the systemctl service to succeed.
As the bug should be fixed in recent RHEL/CentOS 7 point releases, I strongly suggest to update your system if you can. Otherwise, try passing rd.retry=30
as kernel boot option.