Ext4 vs. XFS vs. Btrfs vs. ZFS for NAS [closed]

My use case: I have Ubuntu Server 18.04 installed on an M.2 SSD. I have a 4TB HDD I want to add as storage. Since it's mostly for large media files and backups, it won't be written to very often.

Which filesystem do you think is best suited for this use case?

My leading candidates are Ext3/4, XFS, Btrfs, and ZFS (feel free to argue for another).

I'm not asking "What is the best filesystem?"—There is no such thing as 'the best.' I'm just asking people which filesystem might be most suited for this use case. Please try to include:

  • Are there any drawbacks or risks? I heard XFS can corrupt data if there's a power loss. Same with ZFS without ECC RAM.
  • Is it possible to add RAID-1 later on without losing data? I don't have enough money for another hard drive right now (I used that for an external drive; RAID doesn't replace backups), but I may add one later. This isn't a requirement, just something that might be nice.
  • What is the read/write performance? Btrfs would probably fit most of my needs, but it's very slow in Phoronix benchmarks. XFS has impressive performance, but I've heard it can cause data loss.

Thanks for your advice.


Solution 1:

I generally use one of the following two filesystems:

  • XFS for anything which does not play well with CoW (or for virtual machines whose datastore already is on a CoW filesystem) or when extremely fast direct I/O is required;

  • ZFS for anything else.

For your use case I would use ZFS, especially considering that Ubuntu 18.04 already ships it. As you can easily attach another mirror leg to an already existing device, ZFS fits the bill very well. For example, let name your disk nvme0p1:

  • zpool create tank /dev/nvme0p1 create your single vdev pool called “tank”;
  • zpool attach tank <newdev> /dev/nvme0p1 enables mirroring.

If, for some reasons, you don't/can't use ZFS, then MDRAID and XFS are your friends:

  • mdadm --create /dev/md200 -l raid1 -n 2 /dev/nvme0p1 missing will create a RAID1 array with a missing leg (see #1);
  • mdadm --manage /dev/md200 --add <newdev> attaches a new mirror leg (forming a complete RAID1, see #2)

After creating the array, you can format it with XFS via mkfs.xfs

I do not suggest using BTRFS, as both performance and resilience are subpar. For example, from the Debian wiki:

There is currently (2019-07-07, linux ≤ 5.1.16) a bug that causes a two-disk raid1 profile to forever become read-only the second time it is mounted in a degraded state—for example due to a missing/broken/SATA link reset disk

Please also note that commercial NAS vendor using BTRFS (read: Synology) do not use its own, integrated RAID feature; rather, they use the proven Linux MDRAID layer.

EDIT: while some maintain that XFS is prone to data loss, this is simply not correct. Well, compared to ext3, XFS (and other filesystems supporting delayed allocation) can lose more un-synched data in case of uncontrolled poweroff. But synced data (ie: important writes) are 100% safe. Moreover, a specific bug exacerbating XFS data loss was corrected over 10 years ago. That bug apart, any delay allocation ready filesystem (ext4 and btrfs included) will lose a significant number or un-synched data in case of uncontroller poweroff.

Compared to ext4, XFS has unlimited inode allocation, advanced allocation hinting (if you need it) and, in recent version, reflink support (but they need to be explicitly enabled in Ubuntu 18.04, see mkfs.xfs man page for additional information)


1: Example /proc/mdstat file with missing device:

Personalities : [raid1]
md200 : active raid1 loop0[0]
      65408 blocks super 1.2 [2/1] [U_]

unused devices: <none>

2: /proc/mdstat file after adding a second device

Personalities : [raid1]
md200 : active raid1 loop1[2] loop0[0]
      65408 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Solution 2:

This looks more like a question for superuser than for serverfault, but some of the ideas are valid for this site too, so I'll take a stab at answering some of the questions:

  • XFS has had a reputation of not liking power loss. A lot has happened since, and RedHat, Oracle and the like use it as a default file system nowadays, so I wouldn't be surprised if this is a significantly smaller problem today than it used to be. Especially if you have a decent backup policy, if your use case is of the kind that sees a speed benefit from this file system, it might be worth the possible risk.
  • ZFS without ECC RAM still has very nice functionality, but you run the risk of not catching some instances of bad data. I ran ZFS without ECC memory for several years in a home setting similar to what you describe and never experienced obvious data loss, but then again I didn't exactly store critical data on that volume. I have since migrated to proper server hardware and feel a lot more comfortable about actually using my storage for important stuff now.
  • With ZFS you should simply be able to add a disk to a single-disk pool to create a mirror. I haven't tried it in practice, but I get multiple search results claiming success when checking duckduckgo. I'm unsure about other systems, but as long as you have some kind of volume manager underneath your file system going to RAID1 from a single disk should be trivial.
  • The CoW-based file systems (zfs, btrfs) are slower than less feature-rich systems, and they are more dependent on having sufficiently powerful hardware backing them.

As in a corporate setting, I would definitely take the time to run benchmarks with my expected load, to get first-hand experience with performance vs features.

Solution 3:

I would use whatever journalling file system the operating system in question suggested as default unless I had very good reason not to. Last time I checked with Ubuntu that was ext4.

The reason is simple: The default is the one that most likely is used the most, so the chance for any bugs to have been found and fixed is the largest. The differences between the file systems you mention are unlikely to mean much in light daily use. If performance is very important, then consider getting more memory instead so your operating system have more room for caching.

That said, if you are to use the drive "across" operating systems, I would suggest getting two, or at least split the one you have in two physical partitons, and then use either NTFS or exFAT on the one you have media on.

Solution 4:

If you care about the integrity of your data over the long term, I'd suggest zfs or btrfs.

My understanding is those two are the only ones that allow you to scrub the data, looking for and potentially correcting bit rot.

I'm not that familiar with zfs, but have been running btrfs for about 5-6 years. I run it at RAID1 and I have a weekly scrub job. (Note: if you're not mirrored, scrubbing will just just mark those bad blocks instead of repairing!)

The only issue I've had is there was a bug where extents weren't getting freed automatically so the disk filled up, but wasn't actually filled up.

Both have compression and dedupe, but btrfs' dedupe is offline -- which some people prefer.

Another resource to check is this recent post that made it to Hacker News: Five Years of Btrfs

Note: There does appear to be some work towards XFS/ext4 scrubbing

Solution 5:

You can create a XFS formatted filesystem on something which is a mdadm done mirror without having a real mirror.

Create the partition as a mirror with three mirrors but two absent. On a later time you can add the two missing ones.