How do I configure RAID 5, especially stripe size, with 24 x 1.2 TB drives for CentOS 6?

For a Dell R920 with 24 x 1.2TB disks (and 1TB RAM), I'm looking to set up a RAID 5 configuration for fast IO. The server will be used to host KVM VMs that will be reading/writing files of all sizes, including very large files. I am not terribly interested in data safety because if the server fails for any reason, we'll just re-provision the server from bare metal after replacing the failed parts. So, performance is the main concern. We're considering RAID 5 because it allows us to distribute data over multiple spindles and therefore gives us better performance and, while not our main concern, also gives us some data protection. Our NIC is dual 10Gbps.

I'm limiting this question to RAID 5 only because we think this will give the best performance. Only if there is a compelling performance reason will we consider something else. But, I think I'd prefer answers that are related to RAID 5 configurations.

Okay, with the above stated, here is our present configuration thoughts for:

  • 24 Hard Drives: RMCP3: 1.2TB, 10K, 2.5" 6Gbps
  • RAID Controller: H730P, 12Gbps SAS Support, 2GB NV Cache
  • 1 Hot Spare (just to give us some longer life if a drive does fail)
  • 23 Data Drives (of which 1 is accounted as Parity and 22 left for Data)
  • Stripe Size: 1MB (1MB/22 data drives = ~46.5KB per disk--or, do I misunderstand stripe size)?
  • Read Policy: Adaptive Read Ahead
  • Write Policy: Write Back
  • Disk Cache Policy: Enabled

If the stripe size is the TOTAL across the data drives, then I figured ~46.5KB per drive will give us very good throughput. If the stripe size is per spindle, then I've got this all wrong.

Does the stripe size also the size that a single file takes? For example, if there is a 2KB file, would choosing a stripe size of 1MB mean that we're wasting nearly an entire megabyte? Or can multiple files live within a stripe?

Lastly, when we install CentOS 6.5 (or latest), will we need to do something special to ensure that the filesystem uses RAID optimally? For example, mkfs.ext4 has an option -E stride that I'm told should correspond to the RAID configuration. But, during a CentOS installation, is there any way to have this done?

Many thanks for your thoughts on a configuring RAID 5 for fast IO.


Solution 1:

Please use RAID 1+0 with your controller and drive setup. If you need more capacity, a nested RAID level like RAID 50/60 could work. You can get away with RAID 5 on a small number of enterprise SAS disks (8 drives or fewer) because the rebuild times aren't bad. However, 24 drives is a terrible mistake. (Oh, and disable the individual disk caching feature... dangerous)

There are many facets to I/O and local storage performance. There are I/O operations/second, there's throughput, there's storage latency. RAID 1+0 is a good balance between these. Positive aspects here are that you're using enterprise disks, a capable hardware controller and a good number of disks. How much capacity do you require?

You may run into limits to the number of drives you can use within a virtual disk group. PERC/LSI controllers traditionally limited this to 16 drives for single RAID levels and RAID 1+0. The user guide confirms this. You wouldn't be able to use all 24 disks in a single RAID 5 or a single RAID 1+0 group.

Another aspect to consider, depending on your workload, is that you can leverage SSD caching using the LSI Cachecade functionality on certain PERC controllers. It may not be available for this, but understanding your I/O patterns will help tailor the storage solution.


As far as ext4 filesystem creation options, much of it will be abstracted by your hardware RAID controller. You should be able to create a filesystem without any special options here. The parameters you're referring to will have more of an impact on a software RAID solution.

Solution 2:

Do NOT use a single RAID 5 array across 24 1TB disks! I don't much care what you prefer to limit the answers to, it's a bad idea and you should look at other options.

The odds of a disk failing go up with each disk. So does the time it takes to rebuild. When a drive fails, and you replace it, it will use as much IO across all the disks as possible to build the data for the new one. It's very likely that one of your 23 remaining good disks will fail during this process, forcing you to restore the server from backups. Which you say you don't care about...but are you willing to accept doing that once a month? Once a week? As the disks age, it very well could get that bad.

Besides, if you want performance, RAID5 is leading you in the wrong direction. In many cases, RAID5 has worse performance than other options, because it has to calculate parity for every write, and then write that to a drive as well. RAID5 wasn't designed for performance.

If you REALLY don't care about your data, go with RAID 0. But even then, create a few separate arrays, not one giant 24 disk RAID 0.

If you want performance and some integrity, use RAID10. You'll lose some disk space, but get quite a performance boost.

Or you can look at things like ZFS that are designed from the ground up to work with huge amounts of data on disks.

Solution 3:

Your options:

  • RAID 0: This turns all your disks into a single unit with no redundancy. This has the highest read and write performance and the most usable space of any of the options, but the loss of a single disk means the loss of all data.

  • RAID 1+0: This turns all your disks into a single unit with all data present on two disks. The read speed is about the same as RAID 0, the write speed is halved (since you need to write each piece of data twice), and you only have half as much space available. The loss of a single disk has no impact on data availability and minimal impact on read/write speeds.

  • RAID 5: This turns all your disks into a single unit, with a parity value on one disk. The read speed is slightly lower than RAID 0, the write speed is much slower, possibly slower than the write speed of a single non-RAID disk (each write requires a read-modify-write cycle on at least two disks), and you lose one disk's worth of space for parity information. The loss of a single disk can cause a major reduction of read speed (reconstructing the data that was stored on it requires reading data from all the other disks), but has no impact on data availability.

  • RAID 6: This has essentially all the advantages and drawbacks of RAID 5, except that it stores a fancier checksum in addition to a parity calculation, and can handle the loss of two disks without data loss.

If data safety is truly irrelevant (this includes the time spent restoring the data from the original source, which may take days, and time lost re-doing interrupted calculations), I recommend RAID 0. Otherwise, if you have a workload that is almost exclusively reads and you want some reliability, I recommend RAID 6 (but note that performance will suffer when recovering from a failed disk). If you have a read-write workload, I recommend RAID 1+0.

Depending on the precise nature of your workload (ie. if a given task accesses a well-defined subset of your disk space), you may be able to set up multiple independent RAID arrays, so that the failure of one will not impact the others.

RAID 5 provides no benefits in your situation. It has a performance penalty (especially for writing) compared to RAID 0, and with the number of disks you have, it's virtually certain that a second disk will fail during recovery, giving no data safety benefit.

Solution 4:

Okay, just to one clear question -- stripe size. The bigger stripe size is better unless your RAID is dumb to always read/write the whole stripe of data as the minimum I/O chunk.

Why? -- small stripe size implies involving several disks into any lengthy I/O, the less it is the more chances to load several disk with one logical I/O. Big stripe means more chances for just one disk (or a few) to be involved into I/O. This might seem as deficiency cause there's no boost comparing to multiple disks, but then your almost random load jumps in and you realize the load would be spread across all of the disks more or less evenly.

More theory behind this can be found here: http://www.vinumvm.org/vinum/Performance-issues.html