How would you go about backing up a remote Ubuntu VPS via SSH?
Solution 1:
SSH Public Key Authentication
The first thing you want to do is start with ssh public key authentication. This will let your script use SSH without a password.
All that the server needs is SSH installed, and public key authentication set up for the user that will be running the backup script from the RasPi.
Here's a good tutorial for public key authentication: https://hkn.eecs.berkeley.edu/~dhsu/ssh_public_key_howto.html
Option 1: SSH and Tar
You can compress the tar.gz from the server and transmit it directly over ssh with something like this:
ssh [email protected] "tar -czvf - / 2> /var/log/sshbackup" > vpsbackup.tar.gz
This will make the VPS tar and gzip all files on / and transmit it over SSH to store in vpsbackup.tar.gz on the RasPi. A log of the most recent backup will be kept on /var/log/sshbackup on the VPS.
Option 2: Rsync
Sending an entire .tar.gz over SSH is inefficient... Files that don't change will still be transmitted. A better solution is to use rsync, but this makes it difficult to make a .tar.gz that preserves permissions. If you have enough storage space on the RasPi, you can just store the backup files as plain ol' files. Then you can have a script tar.gz them if you want to keep multiple past backups.
The server needs rsync installed. This will run over SSH, so you still use the public key authentication, and keep the encryption. You will need to run this command as root and have public key authentication and SSH logins for root enabled to preserve permissions. Your destination (or at least a temporary destination) should be a Linux filesystem. If you're storing these backups on a FAT or NTFS partition (e.g. on most external hard drives), you can make a loopback filesystem (see http://www.walkernews.net/2007/07/01/create-linux-loopback-file-system-on-disk-file/) for temporary storage. The tar.gz file can be stored on any partition, because it preserves permissions on its own.
An example rsync command:
rsync -a --delete --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/tmp remoteserver.example.com:/ /path/to/backup/destination/
be careful when using --delete, especially as root! It will delete any files in the destination directory that do not exist on the backup source. You should only use --delete when syncing to a dedicated backup directory being used only for that VPS. You should also make sure there is no possibility of your script syncing to the wrong destination (e.g. if /path/to/backup/destination is determined by a shell variable)
rsync will only transfer files that are different between the source and destination. If you have large files, it will also only transfer parts of the file that have changed (for this to work, you must add the -c flag). This means you are using minimal bandwidth, but it'll use more CPU and slow down re-sync preparation times as both sides need to first checksum files to determine which blocks to transfer. If you do use the -c flag and you have large files (such as database files) and/or a flaky connection, consider adding --partial --append, which enables you to resume transfers after a connection is interrupted.