Would a ZFS mirror with one drive mostly offline work?

Scenario: I have two external hard drives, and I'd like one to be a backup of the other. Traditionally, I would periodically connect the second drive and rsync across any changes. Does ZFS provide a better way of doing this?

I would think I'd want to create a 'zfs mirror' setup, however I wouldn't want to transport the backup drive with me all the time for convenience, but rather synchronise any changes periodically. Does ZFS provide a way to do this, or is this not an appropriate use? If so, what's the canonical ZFS-way of doing this? (I don't want to bash the drive by it having to check every single sector for changes every time I want to update the backup drive for example)

ZFS has limited ability to incrementally update a mirrored drive after it has been offline for a while. TL;DR: You can do what you are looking for the way you are suggesting, but it's not what mirrors are meant to do.

In practice, what you are suggesting would almost certainly require a full resilver each time, because the interim changes would lead to too many überblock revisions having gone by, so there would be no common base point for an incremental resilver. If there is a failure during that process, it seems likely that you would be in deep trouble as far as your data is concerned. Also keep in mind that due to its Merkle tree on-disk data format, ZFS resilvers can be (and are) done in a "in order of decreasing data importance", rather than sequentially like non-file-system-based RAID systems. Of course, "data importance" here is as far as ZFS is concerned, not as far as what you might consider to be important or worth keeping. The resultant seek activity can easily put major stress on particularly a single drive.

The canonical way to bring two ZFS file systems in sync is to use a zfs send | zfs receive between them. This requires both file systems to be available (but you can store the output of zfs send and use that as the input to zfs receive later, should you be so inclined, but you should be aware that this comes with a huge caveat: zfs receive makes no attempt to recover from a partially damaged stream of data, and just aborts if errors are detected).

Have one pool for each backup drive. Let's call them tank and pipe. Let's say we have data on tank that we want to copy over to pipe.
Connect both drives, and zpool import both tank and pipe. You can pass -N to zpool import to make it not mount any file systems.
Take a snapshot of the source file system, tank. zfs snapshot tank@current1984 -r
Find the most recent snapshot that both tank and pipe has in common. Use something like zfs list tank pipe -t snapshot to get a raw list to work from. Let's say that the most recent snapshot they have in common is current1948.
Run something like zfs send -R -I tank@current1948 tank@current1984 | zfs receive pipe to incrementally transfer the delta between the current1948 and current1984 snapshots from tank to pipe. Read the zfs man page for more details on the send and receive subcommands.
Wait for that to finish, then optionally delete any snapshots that are no longer needed. Make sure to keep at least one snapshot (for example, current1984) that both pools (file systems, rather) have in common, to use as the base the next time.

At this point, the two pools will have the same content, up to the snapshot you used. If done properly, this should also only require transferring the differences; I cannot imagine a scenario in which an incremental zfs send | zfs receive would need to do anything like a full mirror resilver. It also allows you to later on add redundancy to the backup pools, should you wish to do so. If the source drive fails during the copying process, you should still have the old backup readily available; only the differences that you were attempting to transfer would be lost.

Would a ZFS mirror with one drive mostly offline work?

Related

Recent Posts