AWS S3 Backup Strategies - How should I approach backing up S3 buckets?
I am in the process of building a web-app with potential for a massive amount of storage requirements which can be satisfied by amazon S3.
My main concern is the usage of API keys on the server, and how an unauthorised person could exploit the server in some way, obtain the keys, and use them to destroy all the data in the S3 buckets.
- What strategies should I put in place to minimise the potential exposure of my API keys?
- What would be a robust approach for back-up of terrabytes of S3 assets given a restrictive budget?
The first thing that comes to mind is the fact that data transfer in and out of S3 is quite spendy. If you're backing up frequently (as you ought to be), costs could get out of hand just with transfer fees. That said, to answer your question, backups should be performed from a separate, hardened, server whose only task in life is to perform backups. No apache, remote access only via SSH with key authentication, etc. If you do these things along with ensuring that only a select few people have access to the server, your keys should be quite safe. If you are really paranoid, you can pgp-encrypt the file that contains your keys - the problem with this approach, though, is that it requires you to enter your passphrase each time the backup job runs. That's probably not something you want to sign up for, correct?
After hearing about your restrictive budget, I can't help but think that it would be better for you to change around your storage strategy. I'm not sure what your server situation is, but could you perhaps host the files locally on the server and then just use S3 for backups? There is a great backup script called duplicity that can perform compressed, encrypted, incremental backups to S3 (among several other backend storage types).
[Edit] If you do end up hosting on S3 and backing up to local disk, it looks like there's an "If-Modified-Since" header in the S3 API that will help with performing incremental backups. For backups like this, you're most likely going to need to homebrew something, though it won't be too difficult. Just use SimpleDB/BerleleyDB/etc to store meta information about which files you have backed up along with a pointer to where they reside on disk. Keeping the meta information in a DB will also make quick work out of verifying backups as well as creating reports on backup jobs.