Btrfs on top of a mdadm raid10, or btrfs raid10 on bare devices?
I have a RAID10 managed by the mdadm
and I have EXT4 filesystem on top of it. However, I like BTRFS and would like to convert the EXT4 filesystem to BTRFS, but I was thinking about performance and maintainability. For an example with BTRFS, I can't easily see the status when I remove/add another disk to the array like I can with mdadm (or perhaps I just do not know how - I searched through the BTRFS docs and could not find this).
So, from your experience, what is better choice:
To simply just convert the EXT4 filesystem and let mdadm manage the RAID10?
To get rid of mdadm, and let BTRFS do everything?
Let Btrfs do everything.
For one thing, Btrfs has its own integrated mirroring code which can be smarter than madm.
Of course if a disk fails hard in a mirrored pair in an madm raid10, you can replace the bad disk and move on with your life (albeit after a distressingly complex set of shell commands). The problem is if your disk fails a bit more softly: if a few blocks just give back the wrong bits instead of giving the appropriate error codes for a bad block, then when reading the data you will randomly get bad data. Btrfs is smarter than that: it checksums every bit of data. To be honest I don't know if it's more correct to say "every BTree node" or "every block", but the point is that when it reads some data from a mirrored array, it checks the checksum before giving it back to your userland process. If the checksum doesn't match, it consults the other mirror in the array first, and if that gives the correct checksum, then it will alert you that your disk has started to silently fail.
The Btrfs wiki specifically mentions your question:
If Btrfs were to rely on device mapper or MD for mirroring, it would not be able to resolve checksum failures by checking the mirrored copy. The lower layers don't know the checksum or granularity of the filesystem blocks, and so they are not able to verify the data they return.
Finally, even without this substantial advantage, the command-line workflow for dealing with removed or added Btrfs devices is super simple. I'm not even sure I could get the degraded-mount-then-fix-your-filesystem shell commands right, but for Btrfs it's very clearly documented on the multiple devices page as:
mount -o degraded /dev/sdb /mnt
btrfs device delete missing /mnt
At this point if you have enough space on your remaining disks, you can always just btrfs rebalance
and be done with it; no need to replace the mirror, as you would absolutely need to do with madm! And if you want to replace it, you can do btrfs device add
first.
BTRFS is still experimental and you can end up with "interesting" features if something should crash. If you really have/want to run btrfs it would be for the time being a lot safer to run it on top of a software raid than to just run it directly. When btrfs matures and goes into production this might not be true anymore.