How to improve performance of RSYNC's delta transfer on receiving host using software only?

I'm using RSYNC to backup VMs for Virtual Box from one server to some Synology NAS DS1512+. The important point is that I really want to backup the VM-images itself, NOT individual files within those images. I'm doing that additionally already and is NOT the problem here.

Backing up all those images using --whole-file takes ~3 hours. But the NAS uses BTRFS and I would like to use its snapshots features to really only store differences, which doesn't work with --whole-file, because the whole file gets transferred and really rewritten. --inplace is used already, but doesn't change that concrete aspect, only if new files are created or not. To make efficient use of snapshots, RSYNC really needs to only transfer differences between files.

And that's the problem: When removing --whole-file to only transfer those differences, the time necessary to backup the same amount of data increases a lot. I've killed RSYNC after running 10 hours already, because I need it to finish far earlier to not overlap with other backups etc. Looking at the files transferred after those 10 hours, it seemed to have only been half the way anyway. So delta transfer is far too slow for some reason.

I'm somewhat sure that the bottleneck is I/O on the NAS: The server hadn't too much of that and even in theory it shouldn't matter too much if the server reads using --whole-file or not. Some of those VMs are hundreds of GiB in size and the server uses ZFS, so those images are not necessarily aligned for optimal sequential reads anyway. It has plenty of free RAM to cache things and the disk are more or less idling when not using --whole-file.

Though, especially reads are not too slow on the NAS as well: While there are some drops, it goes up to 50-70 MiB/s for longer periods of time. Writes don't seem too slow as well, but are nowhere as when using --whole-file, when it reaches 100+ MiB/s for large periods of time. What's somewhat interesting is the CPU-load, which is pretty high especially when not using --whole-file and most likely is necessary because of BTRFS compression. But that compression is needed as well to efficiently use the available space.

htop on the NAS

My expectation was that especially for reads it shouldn't matter too much if using --whole-file or not in my setup. BTRFS and ZFS on the NAS don't necessarily align written files for sequential reads anyway. While I guessed that bursts wouldn't be as high as with --whole-file, I expected that delta transfer minimizes the amount of data to write overall and that things would nullify each other therefore. But that doesn't seem to be the case for some reason.

Finally, I'm using the following options:

--owner \
--numeric-ids \
--compress-level=0 \
--group \
--perms \
--rsh=rsh \
--devices \
--hard-links \
--inplace \
--whole-file \
--links \
--recursive \
--times \
--delete \
--delete-during \
--delete-excluded \
--rsync-path=[...] \
--specials

Is there anything obvious in those options explaining the differences between --whole-file and not? Something known to act badly in the latter case? Is there anything that can be improved on the receiving site using RSYNC?

Investing money for more better hardware like SSDs etc. is not an option. Either I find some wrong usage of RSYNC or need to live with --whole-file and not having snapshots.

Thanks for your suggestions!


Solution 1:

RSync seems to have this problem with really large files. I sure feel it when transferring VM images. I've seen this problem with files as small as 30GB, but it may manifest with much smaller files.

I'm lucky enough to have a variety of storage hardware to test on. Unfortunately, I can say that SSDs don't help. Whether reading from an SSD on one computer and writing to a slow HDD-based ZFS filesystem on another (over a gigabit link), or between computers with SSDs, the time from starting the RSync to finishing the transfer is no better than just sending the whole-file (--whole-file), and usually takes twice as long. I've experimented with different block sizes as well. Smaller (4096 vs 65536) was better, but not by a whole lot.

Since nobody has mentioned using RSync as a daemon on the remote side of the connection, rather than depending on SSH, I'll add that is well worth it. As long as traffic to/from the remote computer doesn't have to traverse an unsafe network, it is an easy way to get wire-speed.

But since you are using BTRFS on your NAS, if you were to also use BTRFS on the source machine you could use BTRFS send and receive to transfer only the changes between snapshots. The first transfer will be painful, worse than with RSync, but subsequent transfers should be very quick. Even better, your snapshots will work as expected, with only the changes consuming storage.

In the past, I've done exactly this, except with ZFS, and it works well. Like with RSync, avoiding SSH where it is safe, say by using nc, can be a huge win.

I will probably end up implementing exactly as I describe with BTRFS in the near future and will report back with my findings.