Can ZFS using snaphots replace DRBD using sync protocol A?

This question is related to a former, more generic one, but this time I would like to ask more specifically if ZFS can replace DRBD in a use case like mine. That is two servers, both hosting their own VMs or other services while replicating the VMs or some other data to each other, to shorten downtime in case of hardware malfunction or maintenance. This setup is as intended, I want preferred local reads/writes and can live with potential data loss in some arbitrary time frame.

DRBD is supporting such a scenario with different mirroring protocols, where replication protocol A is async as I need it. The main benefit of DRBD in my setup is that switching to a more sync protocol is easy, simply a matter of configuration and restarts. Additionally, one can put dm-crypt on top of the DRBD block device and put a file system into dm-crypt so that every data is encrypted.

The problem of DRBD on the other hand is that it needs block devices and one has to care about their availability and size using other technologies like RAID and LVM and stuff. Especially the resizing thing seems interesting, as my servers have free slots to attach more disks in future. To support resizing, I would need to use LVM under DRBD, to first add storage using LVM and afterwards resize DRBD to take the new storage into account. Than I still need to deal with dm-crypt, the file system itself etc. This reads pretty complex and a bit error prone to me.

On the other hand, file systems like ZFS already provide their own LVM and are able to send/receive incremental snapshots. So in theory ZFS should be able to implement the same protocol A approach like DRBD does without the need to use RAID, LVM etc. One would simply add storage to the servers, take that into the ZFS pools and it is immediately available to host more/larger VMs or whatever. Sending/receiving snapshots sounds as efficient as whatever DRBD is doing, because it's only incremental binary data with whatever overhead is needed to describe the changes to be understood in the receiving ZFS. I've read of people using endless loops with no sleep time sending/receiving ZFS snapshots and directly deleting them in the source afterwards again, because they are only needed for short term replication.

Encryption seems a problem to ZFS, though. dm-crypt might work on things like zvols and in future it might support encryption directly, but currently it doesn't. Additionally, sending/receiving snapshots seems to always be async, there's no way to implement a more sync protocol like DRBD provides if needed. That might be acceptable though, for not need to use LVM and make the overall setup easier.

Or am I wrong, is there some other way than sending/receiving snapshots how ZFS is able to forward data to other hosts out of the box and ZFS only?

Do you see some fundamental performance difference or limitation in sending/receiving snapshots using ZFS very often in short periods of time vs. DRBD mirroring protocols? The latter might be a bit more efficient because of only block level I guess.

Any other differences between DRBD and ZFS you can think of? Remember that I don't need the HA stuff of DRBD currently, I'm focussing on replication of data for now.

Thanks!


Solution 1:

I asked a somewhat similar (yet different) question lately

In short, it all depends on the replication target you want to achieve and if ZoL fits your use case:

  • if point-in-time replication is adequate, sending incremental snapshots is a very good strategy
  • if near or full real-time synch is required, you have to use DRBD or to setup a network-mirrored zpool (see the link to my question above)
  • in choosing to use ZoL, consider that to provide the ARC, it basically reimplements a large portion of the "pagecache" code by itself. This means increased memory consumption by the filesystem code. Moreover, being a CoW filesystem, ZFS has its own specific performance behavior/requirements.

On the other side, I suggest you against using ZVOLs, which hide some nasty surprises.