ignore files in use (being written to) when using rsync

To check if a file is currently open (if a file is currently written is for sure open by some process) the standard way is to use lsof:

if lsof /your/file > /dev/null; then echo "file currently open"; fi

You can use this snippet to filter find results for only not opened files and use them to feed rsync:

find . -type f -exec sh -c 'if ! lsof `readlink -f {}` > /dev/null; then echo `basename {}`; fi' \; | tr '\n' '\0' | rsync -avz --from0 --files-from=- ./ user@host:destination/

Some notes:

  • readlink -f is needed to have full path of a file, lsof accept only full path
  • tr '\n' '\0' emulate find -print0

One challenge here is to determine whether the files are still begin written to. There is no perfect way to do this. I think the best you can do is to simply check the last-modified timestamp on the files, and only copy those files that have not been modified for a few minutes.

rsync by itself cannot do this, but you can combine it with the find-command:

cd /path/to/directory/with/files
find ./ -type f -mmin +5 -print0 | rsync --archive --verbose --from0 --files-from=- ./ yourotherserver:targetdir/

To break down this command, it does two things:

  1. It uses find ./ -type f -mmin +5 -print0 to identify all files that haven't been modified for at least 5 minutes.
  2. It then feeds this list into rsync using the --from0 and --files-from parameters. This will make rsync only consider those files that find has identified.