Improving speed of large file transfer over high latency link

So, I've recently had the need to pull a large file over the internet from one of our offices overseas. Both offices have 50Mbit fibre links in both directions, but the round-trip-time is horrendous and varies from maybe 450ms on a good day and 750ms on a crap one.

Originally, I tried pulling the file over a VPN connection but after a few failed transfers (smb really sucks over slow links) and speed maxing out at about 128kBps a quick google showed that I was running up against windows TCP window scaling issues.

I have since pushed the file through a commercial private-cloud type service which got the file over here quicker, so the following is more for curiosity than anything else.

Added to the fun, is that internet access at both ends is through a http proxy. I do, however, have admin rights on machines at both ends.

How would you go about getting better speed?

Things I've tried:

1) Plain SFTP between two linux virtual machines, using corkscrew to punch out through the http proxy and a third intermediary to connect the two ends together. Speed achieved: around 600kBps.

2) SFTP but using OpenSSH patched with HPN-SSH. Corkscrew and intermediary config same as 1). Little if any speed improvement.

3) As per 2 but using LFTP with pget -c -n 10 to break the transfer into chunks. This is the best so far, seeing 3.5MBps...

All improvements welcome.


Solution 1:

These days, I'm addressing transfers over long-distance and higher-latency links by wrapping rsync over UDP, using UDR as a transport. UDR uses UDT, which is described as:

UDT is a reliable UDP based application level data transport protocol for distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does. UDT is also a highly configurable framework that can accommodate various congestion control algorithms.

This disables encryption by default, which was a major thing I needed when I was patching HPN-SSH, but the UDP approach has helped quite a bit. The major benefit to the UDR/UDP solution is that command functionality doesn't change much. You end up prepending the rsync command with udr.

udr rsync -avP --stats --delete --inplace /data/ mir1:/data/

Also see: Possibility of WAN Optimization for SSH traffic

Solution 2:

I had the same problem at $lastjob.

Staying purely within my own infrastructure I never found a better solution than LFTP.

If you can justify the expense, you can get appliances that do WAN acceleration. Basically they transparently turn your requests into much larger chunks, thus greatly reducing the chattiness between the 2 sites. Riverbed are probably the best known option there, but IIRC there is also a module for Juniper routers to do so. I do not know of any FLOSS options at the moment.

I actually found the best option was Dropbox et al, but that may not be acceptable for you.