Transferring a large amount of data between continents [duplicate]

Solution 1:

I suggest you use rsync. Rsync supports delta-transfer algorithm, so if your files are only partially changed, or if the previous transfer was terminated abnormally, Rsync is smart enough to sync only what's new/changed.

There are several ports of the original Rsync to Windows and other non-unix-compatible systems, both free and non-free. Please see Rsync Wikipedia article for details.

Rsync over SSH is very widely used, and works well. 10GB is relatively small amount of data nowdays, and you didn't specify what "occasionally" means. Weekly? Daily? Hourly? With 500KB/sec transfer rate it will take around 6 hours, not really a long time. If you need to transfer the data frequently, it is probably better to create a cron task to start rsync automatically.

Solution 2:

Connection across the internet can be a viable option and a program such as bittorrent is exactly suited to this purpose as it will break the files up into logical pieces to be sent over the internet to be reconstructed at the other end.

Bittorrent also gives you automatic error correction, repair of damaged pieces and if more people are needing the files then they will get the benefit of being able to be supplied the file from as many sources as already have (parts of) the file downloaded.

Granted people see it as a nice way to download films and such, but the it does have many more legal uses.

A lot of bittorrent clients also have built in trackers so you don't have to have a dedicated server to host the files.

Solution 3:

Split the file up in chunks of e.g. 50MB (using e.g. split). Compute checksums for all of them (e.g. md5sum). Upload directly using FTP and an error-tolerant FTP client, such as lftp on Linux. Transfer all of the chunks and a file containing all checksums.

On the remote site, verify that all the chunks have the desired checksum, reupload those that failed, and reassemble them to the original file (e.g. using cat).

Revert location of server (I posted under the assumption that the destination site provided the server and you start the transfer locally when the files are ready) as needed. Your FTP client shouldn't care.


I have had similar issues in the past and using an error-tolerant FTP client worked. No bits were ever flipped, just regular connection aborts, so I could skip creating chunks and just upload the file. We still provided a checksum for the complete file, just in case.

Solution 4:

A variation of the answer of Daniel Beck is to split up the files in chunks in the order of 50MB to 200MB and create parity files for the whole set.

Now you can transfer the files (including the parity files) with FTP, SCP or something else to the remote site and do a check after arrival of the whole set. Now if there are parts damaged they can be fixed by the parity files if there are enough blocks. This depends more or less on how many files are damaged and how many parity files you created.

Parity files are used a lot on Usenet to send large files. Most of the time they are split up as RAR archives then. It's not uncommon to send data up to 50 to 60GB this way.

You should definitely check out the first link and you could also take a look at QuickPar, a tool that can be used to create parity files, verifies your downloaded files and can even restore damaged files with the provided parity files.

Solution 5:

Is it one big 10GB file? Could it be easily split up?

I haven't played with this much, but it struck me as an interesting and relatively simple concept that might work in this situation:

http://sendoid.com/