Which filesystem should I use to format my external backup HDD? btrfs?

Solution 1:

According to the principle designer of BTRFS, BTRFS still has some issues as filesystems become full. The send/receive functionality to send/receive snapshot differences for offsite backup also is not fully functional yet, and online deduplication (useful for backing up virtual machine images) is going to happen maybe for the 3.11 kernel (not yet released). raidz support was new for 3.10 and I haven't had a chance to test it yet, I may do that this evening. All in all, BTRFS is still under very active development, and I prefer to wait until it's finished (or at least not getting major functionality improvements with each kernel release!) before actually using it in production.

The advantage of using BTRFS or ZFS on an external enclosure used for rsync-based backups is that you can make snapshots via e.g. a daily cron job, and then do time travel to retrieve old data if necessary (if, say, files have disappeared from your hard drive for unknown reasons and you need to get them back from past backups). I use a USB3 enclosure with ZFSonLinux for that purpose because I need de-duplication support for virtual machines (since the big .img file is always different from the perspective of rsync, de-duplication means that only the actual changed blocks in the .img file get changed on the backup rather than multiple copies of huge 30gb files). Hopefully when the 3.12 kernel comes out the BTRFS deduplication support will be mature enough that I can migrate away from ZFS for this application -- ZFS is cool and all, but the fact that it is not integrated with the Linux kernel causes issues (e.g., I use a Centos 6.4 virtual machine to do the backups because ZFS won't compile against the 3.10 kernel).

For backing up Linux filesystems, create a snapshot on your BTRFS (or LVM) and mount the snapshot (if LVM) and back it up via rsync. That assures that you'll have a consistent backup as of the snapshot time. Then when finished with the snapshot, delete it. (More important with LVM since snapshots have a significant performance impact there). My cron script that fires off the backup job also does the rotation of the snapshots (daily, weekly, monthly) on my ZFS backup filesystem before it starts actually backing up, so that I can time travel if needed.

As far as reliability, hoary old ext4 is probably the most reliable filesystem because of the way it statically allocates structures on disk, meaning you can always find them and at least get most of your data back if things crash badly. The downside is poor performance on edge cases of very large files (where the way inode chain blocks work make random access in those files very slow), issues with large filesystems (which are very slow to create and fsck), or the edge case of large numbers of small files (which exhaust the inode table). I personally continue to run ext4 on top of LVM on top of RAID for my root filesystems and use other filesystems either for performance reasons or for functionality reasons as required.