Should I use "Raid 5 + spare" or "Raid 6"?

What is "Raid 5 + Spare" (excerpt from User Manual, Sect 4.17.2, P.54):

RAID5+Spare: RAID 5+Spare is a RAID 5 array in which one disk is used as spare to rebuild the system as soon as a disk fails (Fig. 79). At least four disks are required. If one physical disk fails, the data remains available because it is read from the parity blocks. Data from a failed disk is rebuilt onto the hot spare disk. When a failed disk is replaced, the replacement becomes the new hot spare. No data is lost in the case of a single disk failure, but if a second disk fails before the system can rebuild data to the hot spare, all data in the array will be lost.


What is "Raid 6" (excerpt from User Manual, Sect 4.17.2, P.54):

RAID6: In RAID 6, data is striped across all disks (minimum of four) and a two parity blocks for each data block (p and q in Fig. 80) is written on the same stripe. If one physical disk fails, the data from the failed disk can be rebuilt onto a replacement disk. This Raid mode can support up to two disk failures with no data loss. RAID 6 provides for faster rebuilding of data from a failed disk.


Both "Raid 5 + spare" and "Raid 6" are SO similar ... I can't tell the difference.

When would "Raid 5 + Spare" be optimal?

And when would "Raid 6" be optimal"?

The manual dumbs down the different raid with 5 star ratings. "Raid 5 + Spare" only gets 4 stars but "Raid 6" gets 5 stars. If I were to blindly trust the manual I would conclude that "Raid 6" is always better. Is "Raid 6" always better?


Solution 1:

In short:

  • If safety is your main concern then go with RAID6 as it can survive any two drives failing at the same time. If a drive fails in an R5+spare arrangement you are not safe from another failure until the spare has been brought up to speed which could take quite some time with large drives (and it is not unheard of for a drive that has been powered down for ages, such as your spare, to fail to spin up when finally called upon).

  • If performance is king, go with 5+spare as the write performance will be better when the array is not in a degraded state - though the performance difference between R5 and R6 is significantly smaller than the difference between R5 and other solutions if you have a good controller (i.e. once that makes a partial block write operation "two/three concurrent reads then parity calc then two/three concurrent writes" most of the time rather than "read-then-read(-then-read)-then-parity-calc-then-write-then-write(-then-write)" which is what some very cheap controllers and software RAID may do.

Edit: I missed a potentially important point first time around:

  • If power consumption is a concern, then R5+spare will have an extra advantage if your controller keeps the spare drive powered down until needed.

Solution 2:

RAID 5 + hot spare:

  • on equal controller hardware better performance than RAID 6
  • you cant lose 2 disk at the same time. when you lose a disk, there's a rebuild time (with the hot spare) in which you have no redundancy. Anything which fails in this time creates a complete loss (short of sending everthing to a good data rescure firm and pay really $$$$)

RAID 6:

  • worse performance than RAID 5 (dependend on controller it can range from very noticable to virtually no difference)
  • you can lose 2 disks at the same time

For any RAID 5 or 6 you have to be carefull to use disks which are not from the same production run. It can happen (I've seen it!) that after a single fail upon rebuild the next disk(s) fail due to the increased stress. Disks from the same run have the exact same firmware and probably very similiar physical properties.

Edit: What to choose

(This also depends on the performance requirements of the server and the tolerable risk.)

If the servers' environment is pretty nice for hardware (colo, climatized etc.), you'll be OK with RAID5 + hot spare.

If the environment makes it more likely that more than one disk fails within short time (vibrations, humidity, dirt), then go for RAID 6.

Always also have an adequate backup and test recovery.

Edit 2: Decent RAID controllers have scrubbing, which verifies periodically all sectors.

Solution 3:

RAID5 uses one parity stripe. It is necessary to calculate the Reed Solomon error correction and write two stripes for RAID6 vs. one for RAID5. RAID5 is used for intense database applications where storage is huge because of the cost of RAID10. RAID5 cost varies from 67% to 94% disk availability where RAID10 is 50%(much higher storage costs) While RAID6 has lower read latency by a very small amount due to rotational latency, RAID6 is between 25 and 31% slower on writes due to the calculation of error correction and the additional writing of the parity bit.

Using the mean time between failure (MTBF) for the drives, the probability of two drives failing one right after another or at the same time is about (0.1% x 0.1%)*12 or 0.001 x 0.001 * 12; if you have 1000 drives running then you will average losing ~1.2 drives per year. Two drives will fail one right after the other about every 8.3 years. Now because drive failure is not a Poisson distribution due to the heavy loads on the drive during rebuild, a failure of a second drive is more likely to occur during this period, and the distribution is closer to a Gamma distribution with slightly higher values after a failure occurs.

The bottom line is, performance for RAID5 is superior to RAID6 on writes and for DB application - far better. For a mostly read application such as a web server, it makes no difference and you should use RAID6. The cost benefits of using RAID5 over RAID10 are huge for large storage. If you can afford the overhead, use RAID10 for highly disk-intensive applications. RAID10 will always perform better.

The biggest bottom line missed is RAID is NOT backup, but a way to limit downtime by providing redundancy. If the data is critical, you should be backing it up (and testing your recovery process).

If one RAID array of 10 2TB SAS drives fails, recovery will cost thousands of dollars and take weeks to recover, if it can even be done.

All RAID arrays eventually fail!

Solution 4:

Speaking strictly from a data integrity viewpoint, yes. You can safely lose any two drives, although it is a rare occurrence to lose two together short of severe physical trauma to the system.

Financially, not quite as much. The hot spare can be powered down until needed, which means that it doesn't use power and incurs no wear.

And as always, RAID is not a replacement for a proper off-site backup plan.