Time Machine, ZFS and deduplication

I'm currently exporting a ZFS partition with dedup=on using ubuntu-zfs and netatalk. However, time-machine insists on creating a sparsebundle on it, and it made me start wondering if this would have any impact on deduplication, since the concept of "files" disappear, and probably the block alignment wouldn't be great either...

P.S. My whole idea of using dedup is that I have a couple of macbooks backing up to the same place, and a lot of their files are equal.


Addendum: It seems the block-level alignment is failing dedup. Here's my experience:

  • Time-machine copies of two different macbooks, with lots of duplicated data among them (total 200Gb)
  • CCC the two macbooks to two sparse images.

Deduplication factor? 1.01x


Any ideas on how to set up ZFS dedup to correctly work with time-machine backups? Should I start looking for other backup (w/ dedup) alternatives?


Solution 1:

Deduplication on ZFS is block-level, so it doesn't depend on the concept of files. The only way the deduplication can be defeated is if the offset of a file within the sparsebundle is not always the same modulo the block size. Since ZFS can use variable block sizes that are larger than the block size of the HFS+ file system inside the sparsebundle, correct alignment isn't guaranteed, but neither is it guaranteed that deduplication will fail.

If you're worrying about space, you may also want to enable compression on your ZFS pool. It imposes extra CPU overhead, but can actually increase effective disk throughput.