How to choose a cloud service for backups

Solution 1:

Any solution that doesn't include encryption on the client side with keys held by the owner is not going to meet the first stated requirement (IP protection / security) - any hack of the server side discloses unencrypted data. This rules out cloud syncing systems such as Dropbox that own the keys.

To avoid hosting the all-important encryption keys on the website's server, which is also likely to be hacked at some point, here's what I would do:

  1. In-house backup server on the customer's own site - has encryption keys and SSH keys for both other servers
  2. Server hosting the website - could be a web host
  3. Cloud backup server or service

Step 1: Server (1) pulls the backup from (2), so most hacks of the website server will not compromise backups. Encryption takes place at this point.

  • I would use rsnapshot over SSH using key-based login, as this has minimal requirements on the web host and in-house backup server - unless you have a large DB to backup it is very efficient in bandwidth and stores multiple versions of the site, and also handles purging of old backups.
  • Encryption could be done by any file to file tool such as GPG, copying the rsnapshot tree to another tree - or you could use duplicity for step 2, saving disk space.
  • "Pull" from the backup server is important - if the main server (2) has the passwords/keys for the backup server, hackers can and sometimes will delete the backups after hacking the main server (see below). Really advanced hacks can install trojaned SSH binaries which could then compromise the backup server, but that's less likely for most companies.

Step 2: server (1) pushes the encrypted backups to (3) so that there is an offsite backup. If the backups were encrypted in step 1, you can just use an rsync mirror of the local rsnapshot tree to the remote system.

  • Duplicity would be a good option to directly encrypt and backup the unencrypted rsnapshot tree onto the remote server. Duplicity's features are a bit different to rsnapshot, using GPG-encrypted tar archives, but it provides backup encryption on the remote host and only requires SSH on that host (or it can use Amazon S3). Duplicity doesn't support hard links, so if this is required (e.g. for a full server backup), it's best if a script converts the rsnapshot tree (which does support hard links) into a tar file (maybe just the files that have >1 hard link, which will be quite small) so duplicity can back up the tar file.
  • Since the remote server is just an SSH host, possibly with rsync, it could be a web host (but from a different hosting provider and in a different part of the country), or a cloud service that provides rsync and/or SSH - see this answer on rsync backups to cloud for its recommendation of bqbackup and rsync.net, though I don't agree with the backup setup mentioned.
  • You can use Amazon S3 as the remote server with duplicity, which would give you really good availability though perhaps it would cost more for large backups.
  • Other options for remote encrypted backups are Boxbackup (not quite as mature, some nice features) and Tarsnap (commercial cloud service based on Amazon S3 with simple command line interface, good deduplication and very thorough encryption).
    • JungleDisk may be an option but I haven't had a great experience with them in the past and their encryption has some issues (from the Tarsnap author).

The security of all the various hosts is important, so this should be adjusted to meet the security profile of the client i.e. analyse the threats, risks, attack vectors, etc. Ubuntu Server is not a bad start as it has frequent security updates for 5 years, but attention to security is required on all servers.

This setup provides 2 independent backups, one of which can be a highly available cloud storage service, operates in pull mode so most attacks on the website cannot destroy the backups at the same time, and it uses well proven open source tools that don't require much administration.

  • Independent backups are critical, because hackers really do sometimes delete all backups at the same time as hacking the website - in the most recent case hackers destroyed 4800 websites, including backups by hacking the web hosting environment rather than the sites. See also this answer and this one.
  • Restoring is very easy with rsnapshot - there is one file in each snapshot tree for every file backed up, so just find the files with Linux tools and rsync or scp them back to the website. If the on-site backup server is unavailable for some reason, just use duplicity to restore them from the cloud backup server - or you can use standard tools like GPG, rdiff and tar to restore the backups.

Since this setup uses standard SSH and rsync, it should be easier to choose a suitable provider with the right uptime guarantees, strong security, etc. You don't have to lock in to a long contract, and if the backup service has a catastrophic failure, you still have a local backup and can switch to another backup service quite easily.

Solution 2:

Software-wise, consider duplicity for incremental backups with asymetric encryption and a dumb receiver (non-cloud howto).

Solution 3:

I always tell my clients that the best, least expensive and most efficient backup solution is one that you build yourself, for your own purposes.

When I build a system for my clients, I use rsync with SSH keys to handle authentication between serverA and serverB, where serverA contains the data to be backed up. The command to archive and rsync the data is contained in a bash script in a non-web-accessible directory, called by cron every H hours (24 for daily, etc. etc.)

The backup server, serverB, is to be used SOLELY for backups. I always advise my clients to use an extremely lengthy password with SSH key authentication to allow for downloading of backups and backing up. Sometimes, my clients need backups to be saved for D days, so I write some scripts to handle that (take data from the active backup directory, apply a timestamp, add to an archive in another directory).