Why don't people just use rsync to back up vmware guests?
If I am running a modern vmware ESXi system, I can drop in a statically linked rsync binary and rsync files to any destination over SSH.
I'm trying to understand why most (all ?) backup of vmware guests is not done this way.
If the VM is running, you can simply use 'vim-cmd vmsvc/snapshot.create' to create a snapshot and then rsync that snapshot to the remote host. (there's even an option to "quiesce" the snapshot)
OR, if you want a more robust backup, you can gracefully halt the VM and rsync over the vmdk file(s).
So ... it seems like I am a simple shell script away from all the backups I ever wanted to do, simply and easily, using plain old rsync.
What am I missing here ?
- Because the transfer speeds out of the ESXi console are purposefully limited.
- Because this isn't scalable in any way.
- Because you'd have to drop a statically-compiled rsync binary onto the ESXi host.
- Because the VMs, the VMDKs, their ramdisk files and other components can change enough to make rsync a losing proposition... do you really want to re-sync a 200GB VM that was rebooted and had a small number of files change?
- Because of CPU/memory resource requirements on the source or destination. Rsync isn't free.
- Because there are other products on the market, both third-party and VMware-provided. Look up Changed Block Tracking.
- Because ESXi is NOT a general-purpose operating system.
Also see: Install rsync on VMware ESX 4.1 server
I used to do just this a few years back. (edit: with VMWare running on CentOS hosts, not ESXi admittedly)
Every night I had a script that would suspend a VM, rsync the files from disk to the backup server and then start the VMs again. It worked quite well except...
Rsync doesn't work very well with a 2GB file.
Its not because rsync isn't brilliant, it more that each 2GB vmdk file changes in ways that are very opaque to rsync, even small changes to the enclosed filesystem produce changes in the vmdk (or all vmdks for some reason) which I blamed on Windows, either automatically defragging or otherwise doing all the other things it does that don't matter if you're running a real system, but show up when you are trying to rsync a VM!
I think the rsync mechanism for detecting changes don't work very well on a 2GB file, whilst it quite often skipped chunks of the start of the vmdk, once it started to find a difference it would simply copy the rest of the file. I don't know if that's an issue with rsync not being able to detect a moved chunk of binary data, or with a lack of memory on the source box, or whether the vmdk just got updated all the way through. It doesn't matter as the result wasd the same - majority of the vmdk got copied.
In the end I simply copied any changed files and overwrite them, still using rsync. I also had better performance simply overwriting the backup file instead of letting rsync copy and replace what was there.
Our backup server wasn't the fastest either and it got to the point where overnight wasn't long enough to back up all running VMs.
However, when we did need to restore a VM, it was really easy and worked beautifully.
Rsyncing a single file is not a backup solution,
what do you do when something happened to the vm and files were deleted, but you only noticed this after your rsync has run again? You will have overwritten the good 'backup' of your files with the bad image now.
If you want backup you need to keep the old versions somewhere, or the diff's. Rsync will only copy over the diffs for you, but it will not store only the diff's, but overwrite the previous file.
There might be options for you here, with rsync, and a copy-on-write filesystem with versioning information, which will in effect store the diffs every time your rsync script runs. This solutions starts to get a bit more complicated already, so this is why people resort to known working solutions imho.