Proper use of disk to disk to tape backup using de-duplication and LTO5
I currently have ~12TB of data for a full disk to tape (LTO3) backup. Needless to say, it's now requiring over 16 tapes so I'm looking at other solutions. Here is what I've come up with. I'd like to hear the community's thoughts.
- Server for Disk-to-Disk
- BackupExec 2010 Using De-duplication Technology
- 20+TB worth of SATA drives
- LTO5 robotic library connected via SAS
- 1Gbps NIC connected to network
What I envision is doing a full backup of my entire network which will initially take a long time over the 1Gbps NIC but once the de-duplication kicks in backups should be quick. I will then use the LTO5 to make disk to tape backups and archive those accordingly.
What does everyone think? Any faster way of doing the initial full backup over the 1Gbps NIC? What will be my pain points? Is there a better way of doing what I'm trying to achieve?
Solution 1:
I'm currently doing nightly backup of my datasystems, using mostly rsync
and rsnapshot
for some more 'user visible' volumes.
The biggest volume has a capacity of 16TB, currently 9.5TB used. It first does a simple rsync
to a separate disk array. This takes between typically 30-45minutes.
Then, it does a second copy to an offsite server over a 100Mbit wireless link (althought we get typically 50-60mbit effective after some packet loss). This takes roughly 3 hours each night.
So, yes; I think disk-to-disk backup of big volumes isn't a hard thing to do. You don't even need some fancy buzzword-compliant software, simple tools are quite capable.
Solution 2:
Of primary interest here is whether you're looking to do backups, or just to maintain an active copy. A single active copy of 16tb updated nightly is certainly a doable thing disk-to-disk, and it'll almost certainly be cheaper than a tape library; that said, consider that your last-resort restore option is now being stored on physically-collocated spinning disk that's vulnerable to all the usual issues of drive failure, corruption on power loss, etc - so design your disk system with an appropriate level of redundancy.
The way we've been doing it, on about 350tb of data, is a simple sync to relatively high performance front end disk, which is then migrated to tapes via robotic library for offsite storage. This gives us fast backup and fast restore for recent (active) data, but ensures reliable tape offsite storage in case of disaster.
Don't be taken in by aggressive sales claims about dedupe in backup - you'll just end up paying in cpu cycles to process the dedupe rather than paying in disk, your restore times will probably suffer since you're now bound on the dedupe system to tell you where your blocks are before you can restore them, and (my personal nightmare) if the dedupe system encounters a data-loss error condition, your last-resort backups are hosed.
These are of course only my own opinions; I hope they're useful to you in designing a backup solution. Best of luck!