How exactly does rsync decide what to sync?
I'm finding multiple answers to the question, so wanted to ask people that actually use it, rather than just want to make the biggest blog by filling out with random semi useless information.
Scenario: I
rsync -av --progress /dir/a /dir/b
and it does its thing.
I add new files to /dir/a and run the same command again, it knows what it did and only copies the new files.
I add new files to /dir/a and rename some files in /dir/b, and maybe delete a few too.
If I run rsync -av --progress /dir/a /dir/b
again, what will be copied? Just the new files because it knows what it has previously copied, or the files that were renamed/deleted ones too, because they are no longer present.
And as a bonus, if the previously copied files are copied again, is there a way to prevent that, so that only new additions to /dir/a are copied?
At the moment I'm happy checking things manually, but as the data gets bigger I'm going to need more automation to perform this task.
Solution 1:
I add new files to /dir/a and run the same command again, it knows what it did and only copies the new files.
No, it doesn't know what it did in a previous run. It compares the data on the receiving side with the data to be send. With small enough data, this won't be apparent, but when you have large enough directories, the time spent comparing before the copying actually starts is easily felt.
The default check is for file modification times and sizes. From man rsync
:
-c, --checksum
This changes the way rsync checks if the files have been changed
and are in need of a transfer. Without this option, rsync uses
a "quick check" that (by default) checks if each file’s size and
time of last modification match between the sender and receiver.
This option changes this to compare a 128-bit checksum for each
file that has a matching size. Generating the checksums means
that both sides will expend a lot of disk I/O reading all the
data in the files in the transfer (and this is prior to any
reading that will be done to transfer changed files), so this
can slow things down significantly.
And:
-u, --update
This forces rsync to skip any files which exist on the
destination and have a modified time that is newer than the
source file. (If an existing destination file has a
modification time equal to the source file’s, it will be updated
if the sizes are different.)
Note that these are not implied by the options you used. -a
is:
-a, --archive archive mode; same as -rlptgoD (no -H)
-r, --recursive recurse into directories
-l, --links copy symlinks as symlinks
-p, --perms preserve permissions
-o, --owner preserve owner (super-user only)
-g, --group preserve group
--devices preserve device files (super-user only)
--specials preserve special files
-D same as --devices --specials
-t, --times preserve times
Solution 2:
General
If I understand correctly, rsync -av
has no memory, so it will copy the files that were renamed/deleted too, because they are present in the source but no longer present in the target.
Tips
Use the option
-n
, 'dry run', to check what happens before you run yourrsync
command line.-
Notice the special meaning of a trailing slash after the source directory, and see the difference between
rsync -av --progress dir/a/ dir/b
and
rsync -av --progress dir/a dir/b
which is described in the manual
man rsync
.
Example
Your special case (adding a file to the source directory 'a' and removing a file from the target directory 'b') will add both the added file and the previously copied file, because it is still in the source directory. This will happen both with and without the option -u
and I don't know any option in rsync
to fix that easily, if you want to keep it in the source directory.
But you can remove it from the source directory or put the file name into the file excluded
and use the option --exclude-from=excluded
(for many files) or simply --exclude=PATTERN
for one or a few files.
$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-2
sent 103 bytes received 25 bytes 256.00 bytes/sec
total size is 13 speedup is 0.10 (DRY RUN)
$ rsync -av --progress dir/a/ dir/b
sending incremental file list
./
file-1
6 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=1/3)
file-2
7 100% 6.84kB/s 0:00:00 (xfr#2, to-chk=0/3)
sent 196 bytes received 57 bytes 506.00 bytes/sec
total size is 13 speedup is 0.05
$ echo textx-3>./dir/a/file-3
$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-3
sent 121 bytes received 22 bytes 286.00 bytes/sec
total size is 21 speedup is 0.15 (DRY RUN)
$ rm dir/b/file-1
rm: ta bort normal fil 'dir/b/file-1'? y
$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-3
sent 124 bytes received 25 bytes 298.00 bytes/sec
total size is 21 speedup is 0.14 (DRY RUN)
$ rsync -avun --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-3
sent 124 bytes received 25 bytes 298.00 bytes/sec
total size is 21 speedup is 0.14 (DRY RUN)
$ rsync -avun --exclude=file-1 --progress dir/a/ dir/b
sending incremental file list
./
file-3
sent 104 bytes received 22 bytes 252.00 bytes/sec
total size is 15 speedup is 0.12 (DRY RUN)
Alternative: unison
You may want to test the tool unison
, which is a synchronizing tool. It provides a visual method to identify special cases and decide what to do. There is a GUI version (unison-gtk
).
Solution 3:
It only copies the new files in /dir/a. Whatever you do in /dir/b will be ignored, unless you use the --delete option. In that case, renamed files in /dir/b will be deleted. It will force /dir/b to become exactly like /dir/a.
About the bonus, do you mean like in the case of renaming files in /dir/a, and then rsyncing to /dir/b? I dont think there is a way to prevent rsync from just copying the files again in that case.