What is the overhead of ZFS RAIDz1/2 in HPC SSD Environment?

Example hardware / host:

  • Modern 64 Core CPU, 128GB Memory
  • 8 x Micron Pro 15.36TB u.2 SSDs
  • SSDs connected by dedicated Oculink per device (no backplane or PCIe sharing)
  • Ubuntu 20.04

Use case:

  • A backup server for hundreds of hosts. Backup is performed via incremental rsync, involving first an rsync from the remote host, and then a local copy (using cp)to create a snapshot. - - - Millions of small files (email, html files etc) are typical of a backup.
  • At any one time the server could be dealing with 50 incoming rsyncs (cpu light encryption algo and no compression)
  • Redundancy although would be advantageous, isn't required. At the most single drive failure.
  • Extreme local i/o required for file rotation
  • Use of rsync and hard link differential copying ('rsnapshot') cannot change, this is required by the backup software which is deployed and in production already - so BTRFS snapshots are out of the question.

I have come up with two possible solutions:

  1. Shard my data store, no redundancy. On a per drive basis format in BTRRFS and mount with inline LZO compression.
  • Advantage: Simple and lightweight, no raid management overhead
  • Advantage: Isolated failure, on disk failure only a small portion of backups are lost, which are quick to build up again
  • Advantage: Maximum overall capacity obtained
  • Disadvantage: Complexity of capacity management - not having one large volume means strategically balancing data on specific disks to get the most use out of them
  • Disadvantage: Disk failure does lose data
  1. ZFS RAIDz1/2 across all disks for one large volume
  • Advantage: 1 or 2 disk failure redundancy
  • Advantage: Easy management, everything goes onto one giant volume - plenty of space.
  • Disadvantage: Loss of 1 or 2 disk's capacity

The question: Will there be significant ZFS RAID Management overhead which will reduce performance of the array vs option 1? - In a configuration which his designed at every level to maximise disk throughput between the OS and the SSDs, in the 10s of Gigabits per second, will the overhead of ZFS RAID management cause a significant drop in performance and/or overload CPU or memory?

Thank you.


Solution 1:

Use ZFS. Use LZ4 compression. Tune your ZFS appropriately, as defaults won't be ideal for that many NVMe drives.

Test and benchmark with your actual workloads. We can't tell you how it will perform.

RAIDZ overhead is not a concern for this.

The microbenchmarks for the CPU automatically choose the fastest for your given platform.

See the ZFS module parameters:

enter image description here

Solution 2:

As BTRFS RAID5 is not considered 100% stable, I do not suggest it even for a backup machine. Moreover, with these large SSDs I would use RAID6 rather than RAID5.

So I suggest using MD RAID6 with XFS or, being an Ubuntu machine, ZFS RAIDZ2 with lz4 compression.

EDIT: I simply do not consider using single-drive BTRFS filesystems as a valid backup approach. In this config, any single drive failure will lead to (partial) data loss. I strongly suggest you to use RAIDZ2 with LZ4 compression or, for maximum performance, MD RAID6 with XFS (but losing compression and checksumming).