Tool or script to detect moved or renamed files on Linux prior to a backup [closed]

Basically I am searching to see if there exists a tool or script that can detect moved or renamed files so that I can get a list of renamed/moved files and apply the same operation on the other end of the network to conserve on bandwidth.

Basically disk storage is cheap but bandwidth isn't, and the problem is that the files often will be reorganized or moved around into a better directory structure thus when you use rsync to do the backup, rsync won't notice that its a renamed or moved file and re-transmission it over the network all over again despite having the same file on the other end.

So I am wondering if there exists a script or tool that can record where all the files are and their names, then just prior to a backup, it would rescan and detect moved or renamed files, then I can take that list and re-apply the move/rename operation on the other side.

Here's a list of the "general" features of the files:

  1. Large unchanging files
  2. They can be renamed or moved around

[Edit:] These all are good answers, and what I end up doing in the end was looking at all of the answers and will be writing some code to deal with this. Basically what I am thinking/working on now is:

  1. Using something like AIDE for the "initial" scan and enable me to keep checksums on the files because they are supposed to never change, so it would aid on detecting corruption.
  2. Creating an inotify daemon that would monitor these files/directory and recording any changes relating to renames & moving the files around to a log file.
  3. There are some edge cases where inotify might fail to record that something happened to the file system, thus there is a final step of using find to search the file system for files that has a change time latter than the last backup.

This has several benefits:

  1. Checksums/etc from AIDE to be able to check/make sure that some media did not get corrupt
  2. Inotify keeps resource usage low and no need to re-scan the filesystem over and over
  3. No need to patch rsync; If I have to patch things I can, but I would prefer to avoid patching things to keep the burden lower, (IE don't need to re-patch everytime there is an update).
  4. I've used Unison before and its really nice, however I could've sworn that Unison does keep copies around on the filesystem and that its "archive" files can grow to be rather large?

Unison http://www.cis.upenn.edu/~bcpierce/unison/ claims to be able to detect moves and renames.

There are a couple patches to rsync to add move/rename detection:

http://gitweb.samba.org/?p=rsync-patches.git;a=blob;f=detect-renamed-lax.diff;h=1ff593c8f97a97e8970d43ff5a62dfad5abddd75;hb=master

http://gitweb.samba.org/?p=rsync-patches.git;a=blob;f=detect-renamed.diff;h=c3e6e846eab437e56e25e2c334e292996ee84345;hb=master

Bugzilla entry tracking this issue: https://bugzilla.samba.org/show_bug.cgi?id=2294


This is a bit of an odd solution, but... git detects moves and renames based on file content, so if you were to keep the directories in question under version control then git would be able to detect moves and such and avoid transferring the content (since it's already on both sides of the wire) while still moving things around in the tree.

Just a thought.


interesting suggestions here. Also thought of using filesystem capabilities ie ZFS. Found it strange that there is no tool which does that simple thing. Unison option does not work in most cases as people report, not for me either.

I want the feature to keep backup of my movie collection on second hard disk in sync when rearraring folders.

Now i found this simple C script http://sourceforge.net/projects/movesync/

Seems to work fine. Run it and then sync normally with ie unison.


You might be able to use a host based IDS such as AIDE and write a wrapper script using its output. You would likely have to write more complex logic considering the checksums.

Otherwise, a network based filesystem might make sense, as the changes would be reflected at all locations. Nevertheless, I suspect you are transferring over the Internet, which will limit options here.