Determine if file is in the process of being written upon?

Solution 1:

You are on the right track, renaming the file is an atomic operation, so performing the rename after upload is simple, elegant and not error prone. Another approach I can think of is to use lsof | grep filename.tar.gz to check if the file is being accessed by another process.

Solution 2:

Your best bet is to use lsof to determine if a file has been opened by any process:

#  lsof -f -- /var/log/syslog
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
rsyslogd 1520 syslog    1w   REG  252,2    72692 16719 /var/log/syslog

You can't easily tell if it's in the process of being written to, but if it is being written to, it MUST be open.


Edit: let's solve the actual problem here rather than try to implement the proposed solution!

Use rsync to transfer the file:

○ → rsync -e ssh remote:big.tar.gz .

This way, the file won't be copied over top of the existing one but copied into a temporary file (.big.tar.gz.XXXXXX) until transfer is complete, then moved into place.

Solution 3:

A bit old, but most of the answers completely misses the point of the question:

But I figured I'd try to figure out if there is simply a way to determine if the file is whole at the command line first...

In general, there isn't. You simply don't have enough information to determine that.

Because determining that the file is closed is not the same as determining if the file is whole. For example, a file will get "closed" if the connection is lost partway through the transfer.

Only @Alex's answer got this right. And even he fell for using lsof somewhat.

To determine if the file has been fully, successfully transferred requires more data. Such as:

One alternative I was thinking of was to have the file be copied as a different file extension (like .tar.gz.part) and then renamed to .tar.gz after the transfer is complete.

That's a perfectly fine way to communicate that the file has been fully and successfully transferred. You can also move files from one directory to another as long as you stay within the same filesystem. Or have the sender send an empty filename.done file to signal completion.

But all methods have to rely on the sender somehow signalling that the transfer has completed successfully. Because only the sender has that information.

Some file formats (such as PDFs) have data in them that allow you to determine if the file is complete. But you have to open and read pretty much the entire file to find out.

lsof will just tell you the file is no longer open - it won't tell you why it's no longer open. Nor will it tell you how big the file is supposed to be.

Solution 4:

The best way to do this is to use incron ("inotify cron system"). It allows you to set an inotify watch on a directory which will then notify you of file operations. In this case, you should watch the dir for a close_write. That'll allow you to then run your command once the file was closed after a write.