Block-level deduplicating filesystem
I'm looking for a deduplicating copy-on-write filesystem solution for general user data such as /home
and backups of it. It should use online/inline/synchronous deduplication at the block-level using secure hashing (for negligible chance of collisions) such as SHA256 or TTH. Duplicate blocks need not even touch the disk.
The idea is that I should be able to just copy /home/<user>
to an external HDD with the same such filesystem to do a backup. Simple. No messing around with incremental backups where corruption to any of the snapshots will nearly always break all later snapshots, and no need to use a specific tool to delete or 'checkout' a snapshot. Everything should simply be done from the file browser without worry. Can you imagine how easy this would be? I'd never have to think twice about backing-up again!
I don't mind a performance hit, reliability is the main concern. Although, with specific implementations of cp
, mv
and scp
, and a file browser plugin, these operations would be very fast, especially when there is a lot of duplication as they would only need to transfer the absent blocks. Accidentally using conventional copy tools that do not integrate with the FS would merely take longer, waste some bandwidth when copying remotely and waste some CPU, as the duplicate data would be re-read, re-transferred and re-hashed (although nothing would be re-written), but would absolutely not corrupt anything. (Some filesharing software may also be able to benefit by integrating with the FS.)
So what's the best way of doing this?
I've looked at some options:
- lessfs - Looks unmaintained. Any good?
- Opendedup/SDFS - Java? Could I use this on Android?! What does SDFS stand for?
- Btrfs - Some patches floating around on mailing list archives, but no real support.
- ZFS - Hopefully they'll one day relicense under a true Free/Opensource GPL-compatible licence.
Also, 2 years ago I had a go at an attempt in Python using Fuse at the file-level to be used over the top of a typical solid FS such as EXT4, but I found Fuse for Python underdocumented and didn't manage to implement all of the system calls.
This sounds very enterprise (as in pricey).
datadomain offers data de-duplication, and maybe netapp with their wafl filesystem. But at a high cost.
A "free" alternative could be zfs.
According to me though the "best" and most Linuxy alternative, although on a file level instead of "block level", would be rsnapshot. It uses rsync and hardlinks to manage versioning.
I rather trust old proven tools than using a new filesystems like Btrfs which hasn't been around long enough for people to discover all kinds of nasty bugs.
I'm looking and investigating exactly the same, I could sugget https://attic-backup.org/quickstart.html#automating-backups for now, seems to be quiet simple and good for backups of linux.
There is also bacula with this feature, but attic seems to be good enough for most cases.