How to backup 20+TB of data?

Solution 1:

You need to take a step back and stop thinking "I've got 20TB on my NAS I need to back up!" and develop a storage strategy that takes into account the nature of your data:

  • Where is it coming from and how much new data are you getting? (you've got this in your question)
  • How is the data used once you have it? Are people editing the pictures? Do you keep the originals and generate edited versions?
  • How long do you need to keep all the data? Are people still making changes to pictures from 2 years ago?

Depending on the answers to the last two questions, you probably need more of a Archiving System than a radically different backup system.

Data that is static (e.g. 2 year old pictures that you retain "just in case") doesn't need to be backed up every night, or even every week, it needs to be archived. What you actually do might be more complex, but conceptually, all the old pictures can be written off to tape (multiple copies!) and not backed up any more.

Based on your comments, some additional thoughts:

  • Since you keep the originals of each shoot untouched and work on a copy, and assuming that at least some of the original pictures are duds, you might be able to cut the amount of data that needs to be backed up in half.

  • If you still can't finish a full backup within whatever window of time you have, a common way to speed things up is to do a disk-to-disk backup first and then later copy the backup set off to tape.

Solution 2:

You have two options:

Option 1:

  1. Buy another NAS
  2. Give your users RO access to the new_NAS
  3. Move all files older than 2 years to new_NAS
  4. Keep backing up old_NAS as usual
  5. Every 6 months move files older than 2 years to new_NAS

Option 2:

  1. Buy another NAS

  2. Run rsync every hour: old_NAS -> new_NAS

    or, better use something like rdiff-backup which does rsync + keeps deltas with file changes (you can restore older versions of the files)

    rdiff-backup  user1@old_NAS::/source-dir    user2@new_NAS::/dest-dir
    
  3. Every 6 months clean old files running something like:

    rdiff-backup --remove-older-than 2Y    old_NAS::/dest-dir
    

Solution 3:

Why do your backups have to complete overnight? Fileserver performance? You might be able to constrain the bandwidth of your backup software to limit impact during the day. Or dedicate an interface on your NAS to talk to the tape drive to limit impact on other traffic.

Can you run full dumps on weekends and only do incrementals during the week? If the problem is changing tapes on the weekend when no one is around, a cheap tape library/autochanger costs a lot less than paying someone to change tapes.

Can you segment your data into multiple groups that are small enough to complete within your backup window?

We have about 50TB of data on a our NAS and it takes over a week to get a full dump of the entire thing using 2 tape drives (one volume takes nearly a week itself because it contains many tiny files). What we do is replicate our data to a second NAS. Our secondary NAS is on-site (but in a different datacenter from the primary), so we still spool data off to tape for off-site backup. We run backups from that secondary NAS so backups don't slow anyone down.

If you can colocate your secondary NAS far enough away, then it can be your backup, no tapes needed.