Copy large file from one Linux server to another

I'm attempting to copy a 75 gigabyte tgz (mysql lvm snapshot) from a Linux server in our LA data center to another Linux server in our NY data center over a 10MB link.

I am getting about 20-30Kb/s with rsync or scp which fluctates between 200-300 hours.

At the moment it is a relatively quiet link as the second data center is not yet active and I have gotten excellent speeds from small file transfers.

I've followed different tcp tuning guides I've found via google to no avail (maybe I'm reading the wrong guides, got a good one?).

I've seen the tar+netcat tunnel tip, but my understanding is that it is only good for LOTS of small files an doesn't update you when the file is effectively finished transferring.

Before I resort to shipping a hard drive, does anyone have any good input?

UPDATE: Well... it may be the link afterall :( See my tests below...

Transfers from NY to LA:

Getting a blank file.

[nathan@laobnas test]$ dd if=/dev/zero of=FROM_LA_TEST bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 29.412 seconds, 164 MB/s
[nathan@laobnas test]$ scp -C obnas:/obbkup/test/FROM_NY_TEST .
FROM_NY_TEST                                    3%  146MB   9.4MB/s   07:52 ETA

Getting the snapshot tarball.

[nathan@obnas db_backup]$ ls -la db_dump.08120922.tar.gz
-rw-r--r-- 1 root root 30428904033 Aug 12 22:42 db_dump.08120922.tar.gz

[nathan@laobnas test]$ scp -C obnas:/obbkup/db_backup/db_dump.08120922.tar.gz .
db_dump.08120922.tar.gz            0%   56MB 574.3KB/s 14:20:40 ET

Transfers from LA to NY:

Getting a blank file.

[nathan@obnas test]$ dd if=/dev/zero of=FROM_NY_TEST bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 29.2501 seconds, 165 MB/s
[nathan@obnas test]$ scp -C laobnas:/obbkup/test/FROM_LA_TEST .
FROM_LA_TEST                                    0% 6008KB 497.1KB/s 2:37:22 ETA

Gettting the snapshot tarball.

[nathan@laobnas db_backup]$ ls -la db_dump_08120901.tar.gz
-rw-r--r-- 1 root root 31090827509 Aug 12 21:21 db_dump_08120901.tar.gz

[nathan@obnas test]$ scp -C laobnas:/obbkup/db_backup/db_dump_08120901.tar.gz .
db_dump_08120901.tar.gz                0%  324KB  26.8KB/s 314:11:38 ETA

I guess I'll take it up with the folks who run our facilities the link is labeled as a MPLS/Ethernet 10MB link. (shrug)


Solution 1:

Sneakernet Anyone?

Assuming this is a one time copy, I don't suppose its possible to just copy the file to a CD (or other media) and overnight it to the destination is there?

That might actually be your fastest option as a file transfer of that size, over that connection, might not copy correctly... in which case you get to start all over again.


rsync

My second choice/attempt would be rsync as it detects failed transfers, partial transfers, etc. and can pick up from where it left off.

rsync --progress file1 file2 user@remotemachine:/destination/directory

The --progress flag will give you some feedback instead of just sitting there and leaving you to second guess yourself. :-)


Vuze (bittorrent)

Third choice would probably be to try and use Vuze as a torrent server and then have your remote location use a standard bitorrent client to download it. I know of others who have done this but you know... by the time they got it all set up running, etc... I could have overnighted the data...

Depends on your situation I guess.

Good luck!


UPDATE:

You know, I got thinking about your problem a little more. Why does the file have to be a single huge tarball? Tar is perfectly capable of splitting large files into smaller ones (to span media for example) so why not split that huge tarball into more managable pieces and then transfer the pieces over instead?

Solution 2:

I've done that in the past, with a 60GB tbz2 file. I do not have the script anymore but it should be easy to rewrite it.

First, split your file into pieces of ~2GB :

split --bytes=2000000000 your_file.tgz

For each piece, compute an MD5 hash (this is to check integrity) and store it somewhere, then start to copy the pieces and their md5 to the remote site with the tool of your choice (me : netcat-tar-pipe in a screen session).

After a while, check with the md5 if your pieces are okay, then :

cat your_file* > your_remote_file.tgz

If you have also done an MD5 of the original file, check it too. If it is okay, you can untar your file, everything should be ok.

(If I find the time, I'll rewrite the script)