PERC H740p single-disk RAID 0 versus JBOD/Pass-Through/eHBA/IT mode (For ZFS on Linux)

We have a server with PERC H740P mini (embedded) and a 2 disk RAID 1 with EXT4 for the OS (CentOS 7.8) and a 6 disk raidz2 ZFS on Linux setup for the data, all on the same controller.

It's generally considered bad® to run ZFS with HW RAID, but this controller doesn't seem to support a mixed RAID/non-RAID setup, so the 6 data drives (for ZFS) are all single disk RAID 0.

We see occasional ZFS panics that I suspect are due to the RAID controller interfering. Where can I read about the exact semantics of single disk RAID 0 for this controller so that I might be able to determine if it is the cause?

Are there any perccli64 incantations or other debuggery I could use to see what the controller might have been doing when ZFS pooped the proverbial bed?


Solution 1:

I think it is difficult that the ZFS panics you are experiencing have anything to do with your hardware RAID controller. You should provide the exact panic / dmesg to let us understand what it is going.

That said, single-disk RAID0 disk is different than a non-RAID disk because:

  • the controller actually writes the required metadata for single-disk RAID 0
  • the controller write-back cache for RAID0 disks is enabled while for non-RAID disks is disabled

That said, your controller supports eHBA mode which, in turn, should pass unconfigured disks as non-RAID disks to the OS. From the docs, it seems that eHBA mode can be used concurrently for RAID0/1/10 arrays and non-RAID disks.

Try passing the ZFS disks as non-RAID drives and please report back.