Offsite backup of Terabytes of data

We have ~1TB of data, and backup everything nightly using custom rsync scripts. The nice thing about rsync is that it only copies modified bytes (not the entire modified file) ... plus it compresses the data before transferring.

In our old system, we had to cart tapes and disks home since every day about 200GB of files were modified. But with rsync only the 1GB or so of modified data within these files are transmitted, and compressed down to ~200MB. As a result, we are able to backup everything to a remote site over a T1 in a few minutes (and under an hour on a very heavy maintenance day). The scripts also utilize Linux hard links to maintain 30 days of full archives (not incrementals) using only 2-4TB (before compression) of space. So we end up being able to restore archived data in seconds, while also maintaining off-site storage.

Luckily disk drive space has kept up with our company growth ... I think our total solution at both locations cost ~$1000.


This is exactly why most companies do backups to tape (lower-cost media than disks, fast streaming write speed), and then physically move the tapes off-site.

You can have the IT guy haul the tapes home, or there are data archival companies that will come to your business, pick up the tapes, and store them at their secure facility. Recovery is as simple as calling the company to bring the tape over, loading it up, and accessing your data.

The internet is good for a lot of things, but moving terabytes of data is not one of them. See Jeff's Article on The Economics of Bandwidth which references Jim Gray's excellent Microsoft Research whitepaper TeraScale SneakerNet (.DOC)