Using rsync to maintain a copy of a directory with a name that changes

I am using rsync on a linux system to synchronize a directory between the local disk and an attached USB drive. The problem I am experiencing is that the 3rd party system that creates the backup daily on the server actually changes the name of a directory nested deep in the backup. This directory has the majority of the data required in the backup. When rsync looks at the directory, it sees that the name has changed and considers it to be a totally new directory. So my rsync copy on the USB drive has a new directory for every day that it runs.

I have written scripts that will actually change the directory name back, but it is a cumbersome way to do it!

I am looking for an "elegant" way to deal with this. Is it possible to create a link to the directory that remains constant? Can rsync be configured to detect that the directory is the same even though the name is changed? I am sure someone has had to deal with this before!

One approach would be to do it in two steps. 1st, rsync everything except the directory in question by using ignore patterns. 2nd, rsync just the directory using globbing in bash to get to the directory, like this:

rsync -av /usr/lib/mydata/bigdatadir*/ /mnt/usbvolume/bigdatadir/

Using a trailing slash on the source directory will effectively cause rsync to ignore the directory name, because it will be invoked on the contents of the directory rather than the directory itself. Of course, this globbing will be easiest if the directory is named with a constant prefix or suffix as in my example above. If it isn't, you could write a script to figure out the actual name of the directory, and do something more direct like this:

rsync -av /usr/lib/mydata/$BIGDATADIRNAME/ /mnt/usbvolume/bigdatadir/

In the end, your pseudo-code would be something like this:

Find $BIGDATADIRNAME
Rsync everything as you were before, but ignore $BIGDATADIRNAME
Rsync the contents of $BIGDATADIRNAME

You may be able to use the rsync options --compare-dest=DIR, --copy-dest=DIR, or --link-dest=DIR. They allow you to specify an additional directory on the receiving end to look in for missing files.

Using --link-dest for example, rsync would create the new version of the directory containing new copies of files that had changed, or else hard-linked copies of files that hadn't.

To use any of these options, you'd have to know the name of the directory on the usb drive from the previous rsync run, so you'll probably want to wrap rsync in a script which figures out the correct directory name first.

You may want to also use --delete-after to delete the old versions of the directory after creating the new version.

Rsync alone can't do what you want, because the only metadata rsync has available is the directory name and MACtimes.

The only way to handle it automatically inside of rsync would be for rsync to somehow have more metadata. Microsoft's DFS replication on Windows, for example, handles this by using the unique ID assigned to files in an NTFS volume to catch when a file (or directory) name changes.

Your inelegant script is probably the easiest way to do what you want w/o dumping rsync and looking for another tool.

Using a symbolic link isn't going to work because the junction point is tied to the name of the destination directory (and the destination directory name would be changing, in this case). You could use a hardlink (which is tied to the inode), but you need to be sure that the directory name is changing rather than just being delete and re-created (which would assign it a different inode).

Using rsync to maintain a copy of a directory with a name that changes

Related

Recent Posts