Does rsync preserve hardlinks on the destination when source files are identical but separate?

Solution 1:

It preserves source hard links if you use the -H or --hard-links option

That will not create hard links -- you'll have to do that after the fact by looking for files with the same checksum, deleting one, and adding a hard link to replace it. After all, you wouldn't want rsync making every content duplicated file a hard link to the same file. Imagine if every 0 length file was a hard link -- you add content to one, you've changed the content for all.

Solution 2:

tl;dr: To preserve file level deduplication via hard links at the destination, run rsync with the --checksum option.

Full answer, according to a series of experiments I did:

If two files are not hardlinked at the source, rsync will sync each of them individually to the destination. It does not care whether the files happen to be hardlinked at the destination. If one of the files (or both of them) ends up being retransmitted, the hard link at the destination will be broken, otherwise it will be untouched. That is, even with the --hard-links option, rsync will not break a hardlink at the destination just because the files are not hardlinked at the source.

The criteria for retransmitting a file depend on the --checksum (-c) and --ignore-times (-I) options.

  • If the option --checksum is given, only files that differ in size or checksum between source and destination are retransmitted. Consequently, if the file content hasn't changed then a hard link at the destination will be preserved even if it doesn't exist at the source.
  • If the option --ignore-times is given, all files are retransmitted, breaking any hard link at the destination that doesn't exist at the source.
  • If neither of these two options is given, rsync will use the modification timestamps of the source and destination files for its decision. In that case, if the timestamps of the two source files differ, a hard link at the destination will always be broken because only one of the two timestamps can match.