Is RAID 1 overkill on Amazon EBS drives in terms of reliability?

My thinking behind this is that RAID 1 creates two or more copies of the data on multiple EBS drives. Yet, aren't Amazon EBS disks virtually fail-safe because they live on multiple physical drives? So then in terms of reliability, you aren't gaining much by adding RAID 1. Is this correct or my facts wrong? I realize you would probably still gain read performance benefits from RAID 1.


Behind the abstraction the drives are already redundant. It is fine to run them in RAID 0 for speed. What is optimal is to use the snapshot functionality for backups. On RAID, this can be done by breaking down the RAID or freezing the volumes, snapshotting, then returning the drives to normal use. Alternatively, writing the data to a single EBS volume and snapshotting that can cover other issues as well, such as instance failure which may leave the RAID drives in an inconsistent state, even when reattached.

TL:DR; Using RAID 1 is overkill, better to prepare for other failure scenarious with robust backups


Yes, EBS is fault tolerant on the back end, but EBS failures do occur and in unexpected ways. What you don't see is the type of failure that most of us are used to - drive goes bad and just fails outright. The most frequent failure is a huge and unpredictable increase in latency which can make your application unresponsive. With RAID1 or RAID 10 sets, you can simply fail the problem drive out of the array and replace it with a new one with no downtime.

Recall ec2pocolypse a couple months ago where a large percentage of EBS volumes became completely unresponsive. Those of us that had RAID10 sets were able to recover easily by failing out a drive or force detaching it with the API. Those that did not (I'm looking at you, reddit) had to suffer through just shy of a week of downtime.

If you actually care about your data, you should never, ever, under any circumstances RAID0 it. By doing this, you increase your probability of failure while reducing your ability to recover from that failure. Snapshotting is great, but unless you stream your binary logs (for example), you cannot perform a point in time recovery. If you are in e-commerce, people get upset when they pay for something that doesn't end up getting shipped because there is no longer any record of it in the database.

I recently wrote about RAID10 EBS after experiencing yet another success from EBS RAID: http://blog.9minutesnooze.com/raid-10-ebs-data/

The question is...who do you trust more with your data? Amazon? or yourself?