How exactly does rsync decide what to sync?

I'm finding multiple answers to the question, so wanted to ask people that actually use it, rather than just want to make the biggest blog by filling out with random semi useless information.

Scenario: I rsync -av --progress /dir/a /dir/b and it does its thing.

I add new files to /dir/a and run the same command again, it knows what it did and only copies the new files.

I add new files to /dir/a and rename some files in /dir/b, and maybe delete a few too.

If I run rsync -av --progress /dir/a /dir/b again, what will be copied? Just the new files because it knows what it has previously copied, or the files that were renamed/deleted ones too, because they are no longer present.

And as a bonus, if the previously copied files are copied again, is there a way to prevent that, so that only new additions to /dir/a are copied?

At the moment I'm happy checking things manually, but as the data gets bigger I'm going to need more automation to perform this task.


Solution 1:

I add new files to /dir/a and run the same command again, it knows what it did and only copies the new files.

No, it doesn't know what it did in a previous run. It compares the data on the receiving side with the data to be send. With small enough data, this won't be apparent, but when you have large enough directories, the time spent comparing before the copying actually starts is easily felt.

The default check is for file modification times and sizes. From man rsync:

-c, --checksum
      This changes the way rsync checks if the files have been changed
      and  are in need of a transfer.  Without this option, rsync uses
      a "quick check" that (by default) checks if each file’s size and
      time of last modification match between the sender and receiver.
      This option changes this to compare a 128-bit checksum for  each
      file  that  has a matching size.  Generating the checksums means
      that both sides will expend a lot of disk I/O  reading  all  the
      data  in  the  files  in  the transfer (and this is prior to any
      reading that will be done to transfer changed  files),  so  this
      can slow things down significantly.

And:

-u, --update
      This  forces  rsync  to  skip  any  files  which  exist  on  the
      destination  and  have  a  modified  time that is newer than the
      source  file.   (If  an  existing   destination   file   has   a
      modification time equal to the source file’s, it will be updated
      if the sizes are different.)

Note that these are not implied by the options you used. -a is:

-a, --archive               archive mode; same as -rlptgoD (no -H)
-r, --recursive             recurse into directories
-l, --links                 copy symlinks as symlinks
-p, --perms                 preserve permissions
-o, --owner                 preserve owner (super-user only)
-g, --group                 preserve group
    --devices               preserve device files (super-user only)
    --specials              preserve special files
-D                          same as --devices --specials
-t, --times                 preserve times

Solution 2:

General

If I understand correctly, rsync -av has no memory, so it will copy the files that were renamed/deleted too, because they are present in the source but no longer present in the target.

Tips

  • Use the option -n, 'dry run', to check what happens before you run your rsync command line.

  • Notice the special meaning of a trailing slash after the source directory, and see the difference between

    rsync -av --progress dir/a/ dir/b
    

    and

    rsync -av --progress dir/a dir/b
    

    which is described in the manual man rsync.

Example

Your special case (adding a file to the source directory 'a' and removing a file from the target directory 'b') will add both the added file and the previously copied file, because it is still in the source directory. This will happen both with and without the option -u and I don't know any option in rsync to fix that easily, if you want to keep it in the source directory.

But you can remove it from the source directory or put the file name into the file excluded and use the option --exclude-from=excluded (for many files) or simply --exclude=PATTERN for one or a few files.

$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-2

sent 103 bytes  received 25 bytes  256.00 bytes/sec
total size is 13  speedup is 0.10 (DRY RUN)

$ rsync -av --progress dir/a/ dir/b
sending incremental file list
./
file-1
              6 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/3)
file-2
              7 100%    6.84kB/s    0:00:00 (xfr#2, to-chk=0/3)

sent 196 bytes  received 57 bytes  506.00 bytes/sec
total size is 13  speedup is 0.05

$ echo textx-3>./dir/a/file-3

$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-3

sent 121 bytes  received 22 bytes  286.00 bytes/sec
total size is 21  speedup is 0.15 (DRY RUN)

$ rm dir/b/file-1 
rm: ta bort normal fil 'dir/b/file-1'? y

$ rsync -avn --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-3

sent 124 bytes  received 25 bytes  298.00 bytes/sec
total size is 21  speedup is 0.14 (DRY RUN)

$ rsync -avun --progress dir/a/ dir/b
sending incremental file list
./
file-1
file-3

sent 124 bytes  received 25 bytes  298.00 bytes/sec
total size is 21  speedup is 0.14 (DRY RUN)

$ rsync -avun --exclude=file-1 --progress dir/a/ dir/b
sending incremental file list
./
file-3

sent 104 bytes  received 22 bytes  252.00 bytes/sec
total size is 15  speedup is 0.12 (DRY RUN)

Alternative: unison

You may want to test the tool unison, which is a synchronizing tool. It provides a visual method to identify special cases and decide what to do. There is a GUI version (unison-gtk).

Solution 3:

It only copies the new files in /dir/a. Whatever you do in /dir/b will be ignored, unless you use the --delete option. In that case, renamed files in /dir/b will be deleted. It will force /dir/b to become exactly like /dir/a.

About the bonus, do you mean like in the case of renaming files in /dir/a, and then rsyncing to /dir/b? I dont think there is a way to prevent rsync from just copying the files again in that case.