How to perform incremental / continuous backups of zfs pool?

ZFS is an incredible filesystem and solves many of my local and shared data storage needs.

While, I do like the idea of clustered ZFS wherever possible, sometimes it's not practical, or I need some geographical separation of storage nodes.

One of the use cases I have is for high-performance replicated storage on Linux application servers. For example, I support a legacy software product that benefits from low-latency NVMe SSD drives for its data. The application has an application-level mirroring option that can replicate to a secondary server, but it's often inaccurate and is a 10-minute RPO.

I've solved this problem by having a secondary server (also running ZFS on similar or dissimilar hardware) that can be local, remote or both. By combining the three utilities detailed below, I've crafted a replication solution that gives me continuous replication, deep snapshot retention and flexible failover options.

zfs-auto-snapshot - https://github.com/zfsonlinux/zfs-auto-snapshot

Just a handy tool to enable periodic ZFS filesystem level snapshots. I typically run with the following schedule on production volumes:

# /etc/cron.d/zfs-auto-snapshot

PATH="/usr/bin:/bin:/usr/sbin:/sbin"

*/5 * * * * root /sbin/zfs-auto-snapshot -q -g --label=frequent --keep=24 //
00 * * * * root /sbin/zfs-auto-snapshot -q -g --label=hourly --keep=24 //
59 23 * * * root /sbin/zfs-auto-snapshot -q -g --label=daily --keep=14 //
59 23 * * 0 root /sbin/zfs-auto-snapshot -q -g --label=weekly --keep=4 //
00 00 1 * * root /sbin/zfs-auto-snapshot -q -g --label=monthly --keep=4 //

Syncoid (Sanoid) - https://github.com/jimsalterjrs/sanoid

This program can run ad-hoc snap/replication of a ZFS filesystem to a secondary target. I only use the syncoid portion of the product.

Assuming server1 and server2, simple command run from server2 to pull data from server1:

#!/bin/bash

/usr/local/bin/syncoid root@server1:vol1/data vol2/data

exit $?

Monit - https://mmonit.com/monit/

Monit is an extremely flexible job scheduler and execution manager. By default, it works on a 30-second interval, but I modify the config to use a 15-second base time cycle.

An example config that runs the above replication script every 15 seconds (1 cycle)

check program storagesync with path /usr/local/bin/run_storagesync.sh
        every 1 cycles
        if status != 0 then alert

This is simple to automate and add via configuration management. By wrapping the execution of the snapshot/replication in Monit, you get centralized status, job control and alerting (email, SNMP, custom script).


The result is that I have servers that have multiple months of monthly snapshots and many points of rollback and retention within: https://pastebin.com/zuNzgi0G - Plus, a continuous rolling 15-second atomic replica:

# monit status

Program 'storagesync'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 05 Apr 2017 05:37:59
  last exit value                   0
  data collected                    Wed, 05 Apr 2017 05:37:59
.
.
.
Program 'storagesync'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 05 Apr 2017 05:38:59
  last exit value                   0
  data collected                    Wed, 05 Apr 2017 05:38:59

You have two different ways you can do that:

  1. The traditional, filesystem-agnostic way that is/was used for the last decades, with tools like rsync or Bacula. There you have tested and (hopefully) stable, large software that can be customized for huge deployments and can be used even if you switch away from ZFS
  2. One of the tools that leverage ZFS send/recv. This can either be your own solution, a script or extended script from the various ones on Github et al., or more feature-rich tools like Sanoid or ZnapZend (send/recv with mbuffer support and retention plans). In this case you will most likely not find any big, "enterprisey" (in the negative sense) solutions, but tools that do just the single task and can be combined with other tools to cater to your specific setup.

In general, I would only trust a tool whose source code is available, and I would keep it as simple as possible. If using send/recv, you don't have to manage much, you just have to delete snapshot n-1 on the local side when transmission and establishment of snapshot n on the remote side was successful.

You can split your transport any way you like, it can even be async (snapshots do not have to be received immediately), if you just keep the iron rule that you can only send a diff between the local current/new and local previous snapshot, and that the local previous snapshot is the most recent one on the remote side (until the backup finishes and everything is reset).

Now that I think of it, you could probably encode that in a state machine and then be sure that no unforeseen cases can slip through.