ZFS best practices with hardware RAID

If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such? Should one turn off the hardware-based RAID, and run ZFS on a mirror or a raidz zpool instead?

With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?

In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?


The idea with ZFS is to let it known as much as possible how the disks are behaving. Then, from worst to better:

  • Hardware raid (ZFS has absolutely no clue about the real hardware),
  • JBOD mode (The issue being more about any potential expander: less bandwidth),
  • HBA mode being the ideal (ZFS knows everything about the disks)

As ZFS is quite paranoid about hardware, the less hiding there is, the more it can cope with any hardware issues. And as pointed out by Sammitch, RAID Controller configurations and ZFS may be very difficult to restore or reconfigure when it fails (i.e. hardware failure).

About the issue of standardized hardware with some hardware-RAID controller in it, just be careful that the hardware controller has a real pass-through or JBOD mode.


Q. If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such?

A. It is strongly preferable to run ZFS straight to disk, and not make use of any form of RAID in between. Whether or not a system that effectively requires you make use of the RAID card precludes the use of ZFS has more to do with the OTHER benefits of ZFS than it does data resiliency. Flat out, if there's an underlying RAID card responsible for providing a single LUN to ZFS, ZFS is not going to improve data resiliency. If your only reason for going with ZFS in the first place is data resiliency improvement, then you just lost all reason for using it. However, ZFS also provides ARC/L2ARC, compression, snapshots, clones, and various other improvements that you might also want, and in that case, perhaps it is still your filesystem of choice.

Q. Should one turn off the hardware-based RAID, and run ZFS on a mirror or a raidz zpool instead?

A. Yes, if at all possible. Some RAID cards allow pass-through mode. If it has it, this is the preferable thing to do.

Q. With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?

A. This is entirely dependent on the RAID card in question. You'll have to pore over the manual or contact the manufacturer/vendor of the RAID card to find out. Some very much do, yes, especially if 'turning off' the RAID functionality doesn't actually completely turn it off.

Q. In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?

A. This is much the same question as your first one. Again - if your only desire to use ZFS is an improvement in data resiliency, and your chosen hardware platform requires a RAID card provide a single LUN to ZFS (or multiple LUN's, but you have ZFS stripe across them), then you're doing nothing to improve data resiliency and thus your choice of ZFS may not be appropriate. If, however, you find any of the other ZFS features useful, it may still be.

I do want to add an additional concern - the above answers rely on the idea that the use of a hardware RAID card underneath ZFS does nothing to harm ZFS beyond removing its ability to improve data resiliency. The truth is that's more of a gray area. There are various tuneables and assumptions within ZFS that don't necessarily operate as well when handed multi-disk LUN's instead of raw disks. Most of this can be negated with proper tuning, but out of the box, you won't be as efficient on ZFS on top of large RAID LUN's as you would have been on top of individual spindles.

Further, there's some evidence to suggest that the very different manner in which ZFS talks to LUN's as opposed to more traditional filesystems often invokes code paths in the RAID controller and workloads that they're not as used to, which can lead to oddities. Most notably, you'll probably be doing yourself a favor by disabling the ZIL functionality entirely on any pool you place on top of a single LUN if you're not also providing a separate log device, though of course I'd highly recommend you DO provide the pool a separate raw log device (that isn't a LUN from the RAID card, if at all possible).


I run ZFS on top of HP ProLiant Smart Array RAID configurations fairly often.

Why?

  • Because I like ZFS for data partitions, not boot partitions.
  • Because Linux and ZFS boot probably isn't foolproof enough for me right now.
  • Because HP RAID controllers don't allow RAW device passthrough. Configuring multiple RAID 0 volumes is not the same as RAW disks.
  • Because server backplanes aren't typically flexible enough to dedicate drive bays to a specific controller or split duties between two controllers. These days you see 8 and 16-bay setups most often. Not always enough to segment the way things should be.
  • But I still like the volume management capabilities of ZFS. The zpool allows me to carve things up dynamically and make the most use of the available disk space.
  • Compression, ARC and L2ARC are killer features!
  • A properly-engineered ZFS setup atop hardware RAID still gives good warning and failure alerting, but outperforms the hardware-only solution.

An example:

RAID controller configuration.

[root@Hapco ~]# hpacucli ctrl all show config

Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380233859A0)

   array B (Solid State SATA, Unused Space: 250016  MB)
      logicaldrive 3 (325.0 GB, RAID 1+0, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 240.0 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 240.0 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, Solid State SATA, 240.0 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, Solid State SATA, 240.0 GB, OK)

block device listing

[root@Hapco ~]# fdisk  -l /dev/sdc

Disk /dev/sdc: 349.0 GB, 348967140864 bytes
256 heads, 63 sectors/track, 42260 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       42261   340788223   ee  GPT

zpool configuration

[root@Hapco ~]# zpool  list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vol1   324G  84.8G   239G    26%  1.00x  ONLINE  -

zpool detail

  pool: vol1
 state: ONLINE
  scan: scrub repaired 0 in 0h4m with 0 errors on Sun May 19 08:47:46 2013
config:

        NAME                                      STATE     READ WRITE CKSUM
        vol1                                      ONLINE       0     0     0
          wwn-0x600508b1001cc25fb5d48e3e7c918950  ONLINE       0     0     0

zfs filesystem listing

[root@Hapco ~]# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
vol1            84.8G   234G    30K  /vol1
vol1/pprovol    84.5G   234G  84.5G  -