rsync or sftp?
Solution 1:
For large files like that, use rsync
with the --inplace
or --partial-dir=
options so that if the transfer fails part way through (due to an unexpected connection drop for instance) you can easily resume by just rerunning the same command.
I tend to use rsync
for most general transfers, not just situations where its fuller synchronisation abilities are actually needed, when it is available. It is no less secure than sftp
if run over ssh
(which it usually is) and no less efficient.
I think the main advantage of sftp
(and the related scp
) is that of generally being available anywhere where ssh
is available, so just about any Linux/BSD/similar client or server setup, whereas rsync
doesn't tend to be installed by default.
Solution 2:
scp/sftp/rsync
are slow when it comes to the actual transfer of files, even on Nehalems as the encryption is quite a heavy burden. Additionally I've got the impression that scp/ssh
set their own magic socket options like TCP window size and are always slow (maxing out at 50-70MB/s on local 10Gbps paths).
Especially for larger files and WANs this is freaking me out, I don't understand why anyone thinks he is more clever then the underlying TCP stack from the OS.
I'd look into the Globus Toolkit's GridFTP (that's the FTP on steroids) which uses parallel/multi-flow TCP sessions for bulk transfers and is also perfectly WAN-tuned. You can back GridFTP with a full-blown PKI or use ssh for credentials and session-initiation. Runs wire speed on 10Gbps and can be scaled out and load-balanced if needed, but that's really something that's needed for TBs of data.
edit: Yes, there are SSH patches that fix the window options and introduce a null cypher for the transfer and only use encryption for key exchange and credentials but you need to have both terminators of the connection to have that SSH build to take full advantage of it.