Amazon EC2 Backup Strategy

I have a couple Web server/DB server setups using Amazon's EC2. I am currently taking daily snapshots of all my system and EBS drives that contains all of my application files, DB files, source code and DB backups. I have a console application that runs the backup creations on a schedule. My images are EBS images.

I am working on a task that will drop my snapshots after so many days. I guess my question is, Should/can I also schedule a complete image/EBS task as well? This way, if the server fails or is corrupted I can just launch the latest image then apply the latest snapshot.

As I am working on my backup strategy, I am using Jungle Disc to back up my data discs.


My recommendations:

  1. Always document and/or script the setup of each new instance so that you can reproduce the software installation and system configuration in the event you lose the instance. Test this by starting a new instance and following the procedure. You can use a custom, private AMI if the installation takes a long time and you need to start instances quickly, but that AMI itself should be built using a documented and/or scripted procedure.

  2. Keep your important data on separate EBS volume(s) and not on the root EBS volume. This has many benefits including making it easier to port your data to new instances (e.g., based on different AMIs) and making it easier to get copies of your data on other instances (e.g., with snapshots and new volumes).

  3. Create regular snapshots of the EBS data volumes. If possible/applicable, use a tool like my ec2-consistent-snapshot to improve the chances that you are taking a snapshot of a consistent filesystem / database. Back up the data outside of AWS/EC2, as your AWS account itself is a single point of failure.

  4. Create snapshots of the root EBS volume from time to time on important instances. Though this may help you in the event of instance or EBS volume failure, that part is not so critical because of #1 and #2 above. The main reason I do this is that creating snapshots reduces the risk of failure of the root EBS volume itself.

The rate of failure of an EBS volume is directly related to the number of blocks that have been modified on that volume since the last EBS snapshot.


Should/can I also schedule a complete image/EBS task as well?

yes, it's advisable. One time it saved me, because I had to reset many times because of kernel problems, until the boot disk was not readable anymore and I simply booted from the latest snapshot.

If you're interested I wrote a Java class to snapshot all connected EBS volumes and also delete them after a certain amount of time. Currently I do a backup every week and discard the third backup after two weeks.

https://github.com/stivlo/obliquid-cp/blob/master/src/main/java/org/obliquid/sherd/runner/RequestSnapshots.java

It perform only one action per run, such as taking or deleting a snapshot, because is meant to be put in a cron hourly to avoid to overload with tens of snapshots at the same times in case you've many EBS as I do.