How to speed up rsync for small files

I'm trying to transfer thousands of small files from one server to another using the following command:

rsync -zr --delete /home/user/ [email protected]::backup

Currently the transfer takes a long time (I haven't timed it). Is there way to make this faster? Should I be using another tool? Should I be using rsync over ssh rather than using the rsync protocol?


Solution 1:

You need to determine the bottleneck. It isn't rsync. It probably isn't your network bandwidth. As @Zoredache suggested it is most likely the huge number of iops generated by all the stat() calls. Any syncing tool is going to need to stat the files. While syncing run iostat to verify.

So the question becomes; how to I optimize stat? Two easy answers:

  1. get a faster disk subsystem (on both hosts if need be) and
  2. tune your filesystem (e.g. for ext3 mount with noatime and add a dir_index).

If by some chance it isn't your disk iops that is the limit then you could experiment with splitting the dir tree into multiple distinct trees and run multiple rsyncs.

Solution 2:

Compression is not very useful for small files (say, less than 100 bytes). For small files, sometimes the compressed version can be even bigger than the original. Try the rsync command without the -z flag.

ssh is good for security, but will not make the transfer faster. In fact, it would make the transfer slower due to the need for encryption/decryption.

rsync may not seem fast the first time it is run because there is a lot of data to transfer. However, if you plan on running this command periodically, subsequent runs may be much faster since rsync is smart about not transferring files that have not changed.

Solution 3:

In case ext3 or ext4 filesystems are involved, check, that both have the dir_index feature enabled! This tripled rsync-throughput in my case.

See details in my answer at: https://serverfault.com/a/759421/80414

Solution 4:

What version of rsync are you using? Anything older then 3.0.0 (on both ends) doesn't have the incremental filelist feature, which speeds up large transfers.