What program should I use to transfer 20TB data across the network?
Solution 1:
rsync is a good way to go (scp is pretty much the same with fewer features). You may want to use the -Z
option, which will enable zlib compression. Depending on how fast your drives/computer are, it may be faster than sending uncompressed, i.e. if your network link is saturated. You may also want the archive mode option, -a
which will preserve symlinks, permissions, and creation/modification times, as well as copy directories recursively. Depending on what you're copying you might want -E
which preserves extended attributes and mac resource forks. Finally, --progress
will show you progress information.
Solution 2:
While not as ubiquitous as rsync, I have in the past used a tool call "mpscp" - http://www.sandia.gov/MPSCP/mpscp_design.htm
From Sandia National Labs, it's a file copy tool that runs over SSH that is specially optimized to saturate high-speed networks between close systems (such as copying terabytes of data between two supercomputers at the same site, connected via 10Gb+ or Infiniband). It works well, but can be a bit of a pain to setup. In testing, I've easily seen it run 2x-3x faster than rsync.
Solution 3:
Use rsync and consider using it with rsyncd. If you use rsync without rsyncd, you're stuck using ssh, which means using some kind of encryption. You're probably copying the data from an older machine to a newer machine and the older machine may not have the CPU grunt to encrypt the data for transmission fast enough to keep a gigabit Ethernet link saturated. Test transferring batches of files using both methods and see which way is faster.
For the same reason I would advise testing use of rsync's compression option before committing to using it. Compression is another CPU intensive activity that might not be able to keep up with gigabit Ethernet speeds when attempted on older hardware. rsync is a fifteen year old program, written back when the majority of people even in first world countries accessed the Internet via dialup modem. Network bandwidth vs. CPU tradeoffs were much different then.