How to forward new files from an SFTP server?
I have an sftp server that partners send files to. As soon as the file arrives in the sftp directory, I want to read its contents and send the contents for further processing to another server.
To achieve the above, I have set up a very thin file watcher program running on the same machine as the sftp server. It uses a file system events watcher library to subscribe to CREATE events - whenever such an event is fired, the watcher reads the file and sends its contents to the processing server.
This works locally. That is, if I mv
a file to the sftp directory from the same machine, the contents are correctly parsed. However, when I actually put
a file into the sftp directory from a remote machine, the following happens:
- CREATE fs event triggered
- SFTP starts transferring data
- File watcher receives CREATE event, opens partially written file, sends partial contents to remote processing server.
- SFTP finishes transferring data.
As a result, I end up with empty contents on the remote server since it reads the file before any data has been transferred to it. I have verified that the files eventually receive all data.
What sequence of FS events is triggered by a SFTP put? How should I solve for the above use case? I was exploring simple delays (once you receive CREATE event, wait 5 seconds, then read file) but none seem sustainable.
If relying on inotify
, you should watch on CLOSE_WRITE
event rather than CREATE
.
If you don't need recursive monitoring, you can give a look at incrond
(and incrontab
)
Alternatively, you can simply schedule rsync
to run with short interval (ie: 1 min) and clean up the source dir in off-working hours, when you can stop the SFTP service (to be 100% certain no one is uploading a file during the cleaning operation).
EDIT: well, it seems the your library of choice does not provide the CLOSE_WRITE
event, but only MODIFY
(see here). The issue with MODIFY
is that any writes will trigger a separate event, which means a single big file upload can trigger an undefined amount of MODIFY
events.
If you want to continue using a notify library, I suggest you to evaluate incrond
, lsyncd
or inotifywait
Regarding the plain rsync
approach, you can surely end transferring a partially uploaded file, which will howevever be completely transferred the next rsync
cycle after its initial upload finished. On the receiving side, you should be sure to only process completely transferred files (a thing you should check even with a notify approach).
More broadly, I strongly suggest you to test with a representative file size sample because testing with small files can hide some timing-related issues which affect bigger files.