Software RAID10 for later growth

Amazon does not recommend RAID1 (which is part of RAID10). See "Amazon EBS Volume Durability" in http://aws.amazon.com/ebs/ where they state:

"Because Amazon EBS servers are replicated within a single Availability Zone, mirroring data across multiple Amazon EBS volumes in the same Availability Zone will not significantly improve volume durability."

Based on third party benchmarks and statements made by Amazon, I believe that RAID0 can help performance. My impression is that folks seem to get the most benefit using up to 4 EBS volumes in RAID0 with decreasing benefits above that. Make sure you are using an EC2 instance type with high IO bandwidth.

LVM can itself do striping across multiple EBS volumes, effectively implementing RAID0. If you're already going to use LVM for the ability to add volumes to grow the file system, this might be easier to manage than LVM on top of mdadm RAID0.


The short answer to your question is that, to my knowledge, you cannot grow a linux software RAID partition, so RAID won't help you there, however RAID10 is a good idea for a number of other reasons and RAID0 is nearly always a bad idea if you care about your data or downtime. I see a lot of advice on the Internet about using RAID0 with EBS volumes and it's an absolutely terrible idea in all but the most exceptional circumstances.

With such a small volume set (you said 8x1GB, so 4GB usable), I would just skip all this complexity and use a single volume which you can grow up to 1TB using XFS snapshots. With only a few gigs of data, you should be able to snapshot the volume frequently enough that data recovery becomes an easy problem and you aren't going to be maxing out I/O. Alternatively, if you can afford more than your current $.80/month for your disk, just make it bigger now and don't worry about this headache for a long time. If you really meant 8x1TB instead of 8x1GB, keep reading.


I wrote an article about this a few weeks back http://blog.9minutesnooze.com/raid-10-ebs-data/ and briefly covered this subject at Percona Live back in May: http://www.percona.tv/percona-live/running-an-e-commerce-database-in-the-cloud

I will summarize here.

In the world of physical hardware, the ways that disks can fail is known and somewhat predictable. On the other hand, EBS volumes fail in atypical ways. You don't see disk "crashes" - mdadm will never automatically mark a disk as failed. What you get are volumes experiencing severe and irrecoverable performance degradation. Sometimes the volumes are just slow, but sometimes they completely lock up with 100% utilization and no IOPS being performed, essentially becoming unavailable. Sometimes the disk comes back to life enough to get data off of it, but sometimes not. This is what happened in the great EC2pocalypse of April, 2011.

If your RAID0 in this scenario, you will have few options. The array will be locked up and the data stuck with it. Sometimes you can snapshot the volumes in the array and restore the snapshot, but consistency is difficult to guarantee and you will have downtime - likely several hours, as writing snapshots is a very slow procedure and RAID arrays tend to be large.

However, if you RAID10 and you end up with one of these poorly performing or severely degraded volumes, all you need to do is mark the degraded volume as failed, remove the it from the array, and replace it. I have done this many many times on our active master database servers which have 10-20 volumes in a RAID10 set (don't use that many. It's overkill unless you need a 10TB array).

My proof of this goes back to my experience with EC2Pocalypse (and multiple other minor EBS outages). While some of the most popular sites on the Internet were down for 4 days, my employer experienced less than an hour of downtime in our production environment because we were able to recover the RAID10 arrays by removing the failed disk(s). Has it been RAID0 it would have been an SOL situation.

The downside is the weakest link syndrome...Performance of the array is tied to the worst performing member. The more volumes, the greater the odds that one will degrade, but that's really a monitoring problem. One could even automate the recovery, if so inclined, though I have not done so. With RAID10, you increase your odds of having a problem in the array, but also increase your odds of recovery. With RAID0, each additional drive is little more than an additional liability.

I hope this helps some.


I did that benchmark some time ago. The commands I used are here : http://wiki.linuxwall.info/doku.php/en:ressources:articles:benchmark_ebs

From what I saw, there is little advantage to split your storage in so many EBS volumes, and then aggregate them using mdadm and lvm. There is, however, a clear advantage in using RAID 1 and LVM to prevent single volume loss while keeping the capacity of adding another pair of RAID 1 later on.

But, to answer you question:

You cannot grow a RAID volume. If you create a RAID 10 (4xEBS), and use LVM with it, then you can add another RAID10 and add it to you LVM volume. But you won't grow the initial RAID 10.

You can create a RAID 10 using those commands:

# mdadm --create /dev/md1 --verbose --level=raid1 --raid-devices=2 /dev/sdh1 /dev/sdh2
mdadm: size set to 104857536K
mdadm: array /dev/md1 started.

# mdadm --create /dev/md2 --verbose --level=raid1 --raid-devices=2 /dev/sdh3 /dev/sdh4
mdadm: size set to 104857536K
mdadm: array /dev/md2 started.

# mdadm --create /dev/md3 --verbose --chunk=32 --level=raid0 --raid-devices=2 /dev/md1 /dev/md2
mdadm: array /dev/md3 started.

And you can create a LVM volume on top of this RAID10 with the following commands:

# pvcreate /dev/md3
  Physical volume "/dev/md3" successfully created

# vgcreate RAID10 /dev/md3
  Volume group "RAID10" successfully created

# lvcreate -L 190G -n store RAID10
  Logical volume "store" created

This is not specific to EBS, but there is good news in the release announcement for mdadm 3.3:

This is a major new release so don't be too surprised if there are a few issues...

Some highlights are:

...

  • RAID10 arrays can be reshaped to change the number of devices, change the chunk size, or change the layout between 'near' and 'offset'. This will always change data_offset, and will fail if there is no room for data_offset to be moved.

...

According to this answer on U&L, you will need at least linux 3.5 as well.