recommendations for efficient offsite remote backup solution of vm's
I am looking for recommendations for backing up my current 6 vm's(and soon to grow to up to 20). Currently I am running a two node proxmox cluster(which is a debian base using kvm for virtualization with a custom web front end to administer). I have two nearly identical boxes with amd phenom II x4's and asus motherboards. Each has 4 500 GB sata2 hdd's, 1 for the os and other data for the proxmox install, and 3 using mdadm+drbd+lvm to share the 1.5 TB's of storage between the two machines. I mount lvm images to kvm for all of the virtual machines. I currently have the ability to do live transfer from one machine to the other, typically within seconds(it takes about 2 minutes on the largest vm running win2008 with m$ sql server). I am using proxmox's built-in vzdump utility to take snapshots of the vm's and store those on an external harddrive on the network. I then have jungledisk service (using rackspace) to sync the vzdump folder for remote offsite backup.
This is all fine and dandy, but it's not very scalable. For one, the backups themselves can take up to a few hours every night. With jungledisk's block level incremental transfers, the sync only transfers a small portion of the data offsite, but that still takes at least a half an hour.
The much better solution would of course be something that allows me to instantly take the difference of two time points (say what was written from 6am to 7am), zip it, then send that difference file to the backup server which would instantly transfer to the remote storage on rackspace. I have looked a little into zfs and it's ability to do send/receive. That coupled with a pipe of the data in bzip or something would seem perfect. However, it seems that implementing a nexenta server with zfs would essentially require at least one or two more dedicated storage servers to serve iSCSI block volumes (via zvol's???) to the proxmox servers. I would prefer to keep the setup as minimal as possible (i.e. NOT having separate storage servers) if at all possible.
I have also briefly read about zumastor. It looks like it could also do what I want, but it appears to have halted development in 2008.
So, zfs, zumastor or other?
This might not be possible in your situation, so I hope I don't get down-voted in that case, but it might be more efficient to change your backup strategy. If you back up specific data instead of VM snapshots, your backups would run much quicker, and it would be easier to capture changes.
Depending on your VMs and what they're used for, you can just have them back up data to where you store the snapshots now daily (or whatever schedule is appropriate), and then JungleDisk can back up just the data. That would more efficiently transfer changed files, and the space required for backups as well as time needed would be reduced. In addition, you could still take snapshots to retain, and just do that much less often (weekly, for example).
In this case, you could always just bring up a new VM and restore data, or use an older snapshot to restore the VM, and then use the data backup to restore to the most recent point.
If I were doing offsite backups i would choose the following options:
(a) shell script that does SCP copy to remote server, This way you could add a cron job that automatically runs the script that creates the backup. Additionally you can make it so that it creates a temporary archive file before actually transferring the files thereby saving bandwidth by not transferring while sill gziping.
or
(b) Install a server management tool like Webmin and get that to do automated backups. I am currently sing this on my production servers right now without any problems, It just works flawlessly. I would also recommend cloudmin (paid) for managing many vm's as it provides an all in one solution.
some extra links:
http://www.debianhelp.co.uk/backup.htm
http://ubuntuforums.org/showthread.php?t=35087
Hope that helps, RayQuang