How is raid implemented at the *disk* level?

You will find all relevant details here.

Basically, all your assumptions are correct: RAID 50 is a striping (RAID 0) of RAID 5 arrays, while RAID 10 is a striping of RAID 1 arrays.

How this is physically implemented, however, depends strongly on the disk controller; sometimes, additional space is used for internal informations, so you can't know exactly how, when and where every single byte is used, unless you ask the controller vendor.

About the stripe size: this is almost never relevant, unless you are into heavy performance tuning; in this case, it can have an impact, but it depends (again) on the controller and disks you are using, and also on the OS, the filesystem and the actual I/O load.

As a rule of thumb, it's good practice to have the stripe size of the RAID array match the cluster size of the filesystem with which the volume residing on that array will be formatted; and that size should be chosen depending on the I/O load the volume is expected to handle (many small files or lots of big files?); but this is only a general suggestion; and, again, lots of other parameters can influence I/O performance.

Also, keep in mind that you could have multiple volumes on the same RAID array (even more so if you are working with a SAN instead of local storage), each of them potentially using a different cluster size and handling a different I/O load.

If you really want to fine-tune your storage to such a level, not only you will need complete control of each and every element from the physical disks to the actual application storing data on them, but you will also have to analyze them carefully and customize a lot of parameters, of which stripe size is only one of many.


A simple case study: Exchange writes database transaction logs as 1-MB files, sequentially; they are mostly written and rarely read under normal operation; they can take up some space, but never too much if regular backups are performed, because they get truncated (i.e. the oldest ones are deleted) everytime a full backup of the database is completed.

The best possible approach for storing this kind of data would be to use a RAID 1 array of two disks, with a stripe size of 1 MB, battery-backed-up write cache, a single volume formatted with the NTFS filesystem and 1-MB cluster size; oh, and of course you'll have to store only the transaction logs for a single database on this volume; if you have more DBs, you will need to use different volumes and disk arrays, or you'll lose all benefits of sequential I/O. (BTW, the actual database data must go to a whole different place, and not only for performance but mostly for data safety; have a look at the Exchange documentation if you want more details; but the basic points are, they have completely different I/O patterns and you absolutely don't want to lose both the database and the transaction logs at the same time.)

As you see, this kind of assessment is very strongly dependent on the expected I/O load, and wouldn't be adequate for anything else than storing Exchange transaction logs in a very specific setup; it will probably hinder any other workload.

Storage fine-tuning is an art, and requires lot of analysis and experience to get it correct.


Massimo gave a pretty good summary and as he says, a lot depends on the kind of workload you are running.

In addition, the controllers and their firmware play a big role. For instance, at home I have an LSI 8 port SAS/SATA HBA that can be flashed to run as a RAID controller. The same hardware is badged by Dell but the firmware sets up a different queue depth to support specific Dell disks. My OEM firmware outperforms the Dell firmware by about 30% when using 5x 4TB WD consumer disks in my home machine. If I flash the Dell card with OEM firmware the performance is identical.

ConcernedOfTunbridgeWells notes that you have spinning disk...

If you are able to run this workload on Linux/Unix you might consider one of the filesystems that allows SSD caching of the magnetic disks.

At home I run ZFS on Linux and its extremely reliable due to its flexible parity and continuous hash based consistency checking. It natively supports SSD caching and its lightning fast with only a modest SSD cache drive. The ZFS array with LSI in HBA mode is faster than using the LSI as a hardware array. The workload is Openstack virtualisation (its my lab machine).

Better still is just use a proper SAN or even NAS that knows how to adapt the controllers, caching, striping etc for specific workloads.