Can I rsync to multiple destinations using same filelist?
I'm wondering if it's possible for rsync to copy one directory to multiple remote destinations all in one go, or even in parallel. (not necessary, but would be useful.)
Normally, something like the following would work just fine:
$ rsync -Pav /junk user@host1:/backup
$ rsync -Pav /junk user@host2:/backup
$ rsync -Pav /junk user@host3:/backup
And if that's the only option, I'll use that. However, /junk is located on a slow drive with quite a few files, and rebuilding the filelist of some ~12,000 files each time is agonizingly slow (~5 minutes) compared to the actual transfer/updating. Is it possible to do something like this, to accomplish the same thing:
$ rsync -Pav /junk user@host1:/backup user@host2:/backup user@host3:/backup
Here is the information from the man page for rsync about batch mode.
BATCH MODE
Batch mode can be used to apply the same set of updates to many identical systems. Suppose one has a tree which is replicated on a number of hosts. Now suppose some changes have been made to this source tree and those changes need to be propagated to the other hosts. In order to do this using batch mode, rsync is run with the write-batch option to apply the changes made to the source tree to one of the destination trees. The write-batch option causes the rsync client to store in a "batch file" all the information needed to repeat this operation against other, identical destination trees.
Generating the batch file once saves having to perform the file status, checksum, and data block generation more than once when updating multiple destination trees. Multicast transport protocols can be used to transfer the batch update files in parallel to many hosts at once, instead of sending the same data to every host individually.
To apply the recorded changes to another destination tree, run rsync with the read-batch option, specifying the name of the same batch file, and the destination tree. Rsync updates the destination tree using the information stored in the batch file.
For your convenience, a script file is also created when the write-batch option is used: it will be named the same as the batch file with ".sh" appended. This script file contains a command-line suitable for updating a destination tree using the associated batch file. It can be executed using a Bourne (or Bourne-like) shell, optionally passing in an alternate destination tree pathname which is then used instead of the original destination path. This is useful when the destination tree path on the current host differs from the one used to create the batch file.
Examples:
$ rsync --write-batch=foo -a host:/source/dir/ /adest/dir/
$ scp foo* remote:
$ ssh remote ./foo.sh /bdest/dir/
$ rsync --write-batch=foo -a /source/dir/ /adest/dir/
$ ssh remote rsync --read-batch=- -a /bdest/dir/ <foo
In these examples, rsync is used to update /adest/dir/ from /source/dir/ and the information to repeat this operation is stored in "foo" and "foo.sh". The host "remote" is then updated with the batched data going into the directory /bdest/dir. The differences between the two examples reveals some of the flexibility you have in how you deal with batches:
The first example shows that the initial copy doesn’t have to be local -- you can push or pull data to/from a remote host using either the remote-shell syntax or rsync daemon syntax, as desired.
The first example uses the created "foo.sh" file to get the right rsync options when running the read-batch command on the remote host.
The second example reads the batch data via standard input so that the batch file doesn’t need to be copied to the remote machine first. This example avoids the foo.sh script because it needed to use a modified --read-batch option, but you could edit the script file if you wished to make use of it (just be sure that no other option is trying to use standard input, such as the "--exclude-from=-" option).
Caveats:
The read-batch option expects the destination tree that it is updating to be identical to the destination tree that was used to create the batch update fileset. When a difference between the desti‐ nation trees is encountered the update might be discarded with a warning (if the file appears to be up-to-date already) or the file-update may be attempted and then, if the file fails to verify, the update discarded with an error. This means that it should be safe to re-run a read-batch operation if the command got interrupted. If you wish to force the batched-update to always be attempted regardless of the file’s size and date, use the -I option (when reading the batch). If an error occurs, the destination tree will probably be in a partially updated state. In that case, rsync can be used in its regular (non-batch) mode of operation to fix up the destination tree.
The rsync version used on all destinations must be at least as new as the one used to generate the batch file. Rsync will die with an error if the protocol version in the batch file is too new for the batch-reading rsync to handle. See also the --protocol option for a way to have the creating rsync generate a batch file that an older rsync can understand. (Note that batch files changed for‐ mat in version 2.6.3, so mixing versions older than that with newer versions will not work.)
When reading a batch file, rsync will force the value of certain options to match the data in the batch file if you didn’t set them to the same as the batch-writing command. Other options can (and should) be changed. For instance --write-batch changes to --read-batch, --files-from is dropped, and the --filter/--include/--exclude options are not needed unless one of the --delete options is specified.
The code that creates the BATCH.sh file transforms any filter/include/exclude options into a single list that is appended as a "here" document to the shell script file. An advanced user can use this to modify the exclude list if a change in what gets deleted by --delete is desired. A normal user can ignore this detail and just use the shell script as an easy way to run the appropriate --read-batch command for the batched data.
The original batch mode in rsync was based on "rsync+", but the latest version uses a new implementation.
I would imagine you could try
rsync --write-batch=foo -Pav /junk user@host1:/backup
foo.sh user@host2:/backup
foo.sh user@host3:/backup
The rsync
--batch-mode
supports multicast. If this is possible on your network, it might be worth looking into that.
You could try using unison. It should be much faster at building the file list because it keeps a cache of the files.
Not a direct answer, but if you use rsync version 3+ it will start transferring before it generates the entire filelist.
Another option, still not very efficient, would be to run them as jobs so a few run at the same time.
Also, I just thought of this strangness if you don't mind using tar:
tar cf - . | tee >(ssh localhost 'cat > test1.tar') >(ssh localhost 'cat > test2.tar') >/dev/null
Where each localhost would be different servers of course (assumes key-based login). Never used the above before though.
how about changing filesystems?
Some time ago, i switched a multi-terabyte FS from ext3 to XFS. The time to scan the directories (with around 600,000 files last time i checked) went from 15-17 minutes to less than 30 secs!