ignore files in use (being written to) when using rsync
To check if a file is currently open (if a file is currently written is for sure open by some process) the standard way is to use lsof
:
if lsof /your/file > /dev/null; then echo "file currently open"; fi
You can use this snippet to filter find results for only not opened files and use them to feed rsync:
find . -type f -exec sh -c 'if ! lsof `readlink -f {}` > /dev/null; then echo `basename {}`; fi' \; | tr '\n' '\0' | rsync -avz --from0 --files-from=- ./ user@host:destination/
Some notes:
-
readlink -f
is needed to have full path of a file, lsof accept only full path -
tr '\n' '\0'
emulate find-print0
One challenge here is to determine whether the files are still begin written to. There is no perfect way to do this. I think the best you can do is to simply check the last-modified timestamp on the files, and only copy those files that have not been modified for a few minutes.
rsync
by itself cannot do this, but you can combine it with the find-command:
cd /path/to/directory/with/files
find ./ -type f -mmin +5 -print0 | rsync --archive --verbose --from0 --files-from=- ./ yourotherserver:targetdir/
To break down this command, it does two things:
- It uses
find ./ -type f -mmin +5 -print0
to identify all files that haven't been modified for at least 5 minutes. - It then feeds this list into
rsync
using the--from0
and--files-from
parameters. This will makersync
only consider those files thatfind
has identified.