SVN Backup using rsync command

I setup a cron for taking backup of my SVN repo( 8 GB) to another server. But some times I get errors and I feel that this is not the proper way to back an svn to a remote server.

I used the command rsync -avz myrepo.

Please suggest me a good way to do svn backup to a remote server. I cannot zip the files and transfer it daily since it's 7 GB.

Thanks


Solution 1:

Summary: rsync should be perfectly fine for backing up an svn repository, as long as you are not backing up a repository that is currently active. I suspect that you are trying to backup an active repository which is problematical.

Detail:

You don't say what errors are reported, which makes any attempt at diagnosis difficult. This is something I regularly moan about our users for - if an application gives you a specific message report that specific message to the people you are asking for diagnostics/support from, even if the message is in fact "an error occurred" or similar (as this does happen).

I'm guessing that the problems being reported are relating to files going missing (they were present during the initial scan but moved/renamed/deleted before that backup was complete), being locked, or apparently changed while rsync was reading them. You will see similar errors (or much worse: related but unreported problems) with most backup techniques if backing up a live svn service and you do not completely stop the svn service before starting the backup run.

Stopping all access to the repository while the backup run takes place may not be on option for you even if it is done in the dead of night (as you might have remote developers who work at different hours). If this is the case then there are a few options, including:

  1. Use hot-backup.py to do a full backup of the repository while it is live as described in this section of the freely available Version Control with Subversion which is generally considered recommended reading. This will not be suitable directly for your remote backup as it will result in the full repo being sent over the line each time, but you can do the backup to a temporary local area and perform the rsync (or anything else) based backup on that rather than the live repository.

  2. If you are running on Linux and use LVM for your drive partitioning you could use LVM's snapshot facility to perform a similar feat as described in option 1. See here and here for example documentation of the technique. This does mean stopping access to the SNV service for a short while, for the length of time that the snapshot takes to be created, but this is near instant so much less likely to be an issue than needing to stop it for the whole backup operation.

  3. Use incremental backups of the live repository, also mentioned in the above SVN book.

The LVM technique will be faster than hot-backup.py-then-sync, but is not available to you without a chunk of extra work and learning unless you already use and are familiar with LVM. Its advantages are that it will be almost certainly be significantly quicker, and will use less disk space (though disk space is pretty cheaply available these days). LVM snapshots do affect write performance while they are present, but the difference is unlikely to be noticeable unless your repository is very very busy and performance will return to normal anyway at the end of the backup run when you drop the snapshot.

The hot-backup.py method has the advantage of giving you a local backup too if you don't already have one - if you store the "hot copy" version on another machine you can restore this much more quickly than you can restore the remote copy if the primary machine dies in an event that doesn't affect the other (a drive controller failure, for example). It is also likely to be simpler to implement, unless you already use LVM and are familiar with it.

Incremental backups will be faster than both of these techniques, but less simple than hotcopy-then-sync and restoration after a complete disaster is potentially more complex unless you use the incremental backups to build a full repo copy at the other end (rather than just storing the incremental information). Rebuilding the repo at the other end is recommended anyway though, as this is a way of testing that your backup is in fact valid - even with the other techniques you should test your backups regularly (mantra: a backup is not a good backup unless it has been tested).

In summary, rsync should be perfectly fine for backing up an svn repository (as would many other techniques but I'm quite a fan of rsync in most use cases myself) as long as you are not backing up a repository that is currently active - you need to stop the service or backup from some form of snapshot.

Solution 2:

How about svnsync (part of svn) to another server also running svn? As transport you could use ssh+svn.