dmraid always starts rebuild after OS restart (and other problems)

I was assigned a task to test and evaluate performance of the hardware RAID under Coffeelake using Intel RST Premium with Intel Optane System acceleration option in RAID1 configuration - under Linux.

I use Ubuntu 19.04, it installs dmraid by default (also tried several other Linux versions requiring additional installation of dmraid). I used to work with enterprise disk arrays in the past, and I can not understand, from user's point of view, the value of the DUT explained above.

After the installation system boots properly, dmraid with options -s and -r report good/sync status. I shut down, remove one of the disks (imagine it is failed), and try to start with only one disk. It does not boot, goes to emergency shell.

Ok, I power off and return removed disk back. In my enterprise-like understanding, RAID system must update differing information using log, and media must become in sync after this operation.

But it does not happen. I see the rebuild in background - most probably full rebuild, dmsetup status reaches final value, and nothing happens any more. It still says XXXX/XXXX, dmraid still says nosync. No disk activity is seen. If I shut machine down, then boot again, full rebuild starts from the scratch, reaches the same stage, and is stuck in there.

I am new of dmraid, tried some -R options, and at some moment dmraid -s said "inconsistent" or something like this, rebuild has started again, but stuck at the final XXXX/XXXX state with volume still being inconsistent.

Went to machine's setup, removed the RAID volume, recreated it, and logically all the information is lost (thus it is not the way to recover from the failed RAID volume).

Tell me what assumptions do I have wrong here, and why system is unable to cope with absense of one disk properly, going awfully crazy when the disk is returned back into the set.

I am totally erasing one of the disks right now to see if dmraid will automatically find and attach it to the set and perform the rebuild for RAID volume to become ready and in sync.

Update: after full erase of one of the disks, and installing it into the system, BIOS/setup says this disk as non-RAID, and original disk as "degraded" with option to "rebuild". After selecting this option setup says "rebuilding", but no disk activity is seen. Then I started Ubuntu, and it goes to emergency shell, it seems volume is not ready, and there's clearly rebuild is being performed in background - confirmed by dmsetup status - but I still can not use the system properly.

After this rebuild completes, volume state is still stuck in nosync state. init 5 hangs, after reboot RAID volume is still unavailable and new rebuild has started.

Thus this type of "RAID" does not withstand disk failure.

Update 1: the configuration works perfectly under Windows 10. Removing one disk from R1 set when system is off keeps system booting from another disk. Windows has GUI to check status and include disks/initiate rebuild. When rebuild finishes, driver updates system RST to proper state and rebuild does not erroneously restart on next triggering event. Sequential read performance in R1 configuration is 1.1GB/s with SATA3 disks (Ubuntu shows 528MB/s).


Solution 1:

In principle, you should be able to boot from an incomplete RAID set (after all, that is what you'd have to do if a disk doesn't spin up after it had been powered down, which is far more likely than a disk failing during operation), but in the default setting that seems to require operator permission for some reason (i.e. force-assemble the array, then continue booting).

Also in principle, merely attempting to assemble the array but not actually doing it should not increment the event counter in the RAID superblocks, which is how the system decides whether a disk can be in sync. If the array is assembled with a missing disk, that disk will miss writes, so of course a rebuild is required afterwards, overwriting the disk.

So my suspicion is that the array is assembled in a degraded state, but some boot code then decides that "degraded" is not good enough, and drops you to the emergency shell. At this point the decision that a rebuild is needed has already been made.

The rebuild should leave you with a consistent state at the end, but maybe device-manager needs some command to finalize here, same as a background pvmove in LVM.

Completely erased disks will only be added if they are recognized as hot spares. In a conservative setup I'd require operator action to designate a disk as a hot spare. You can add spares before disks fail if you have the slots, which allows immediate switchover on failure, but I'd be wary of just taking the first disk to appear after a failure as a spare without at least asking the operator.