What are the main points to avoid RAID5 with SSD?
My understanding is that an SSD has a limited amount of writes. RAID5 performs many writes due to parity information across the drives. So reasoning states that RAID5 would kill and lower the performance of Solid State Drives at a faster rate.
The following statement from This Article, makes me think I don't fully understand or might be incorrect with my above reasoning.
Another niche for high-endurance SSDs is in parity RAID arrays. SLC, due to its inherently superior write latency and endurance, is well suited for this type of application.
Your reasoning is correct, though you're missing the scale of the problem.
Enterprise SSDs are being made with higher endurance MLC cells, and can tolerate very high write-rates. SLC still blows high-endurance MLC out of the water, but in most cases the lifetime write-endurance of HE-MLC exceed the expected operational lifetime of a SSD.
These days, endurance is being listed as "Lifetime Writes" on spec-sheets.
As an example of this, the Seagate 600 Pro SSD line has a listing of this, roughly:
Model Endurance
100GB 220TB
200GB 520TB
400GB 1080TB
Given a 5 year operational life, to reach the listed endurance for that 100GB drive, you need to write 123GB to that drive per day. That may be too little for you, which is why there are even higher endurance drives on the market. Stec, OEM provider for certain top-tier vendors, has drives listed for "10x full-drive writes for 5 years". These are all eMLC device.
Yes, R5 does incur a write amplification. However, it doesn't matter under most use-cases.
There is another issue here, as well. SSDs can take writes (and reads) so fast that the I/O bottleneck moves to the RAID controller. This was already the case with spinning metal drives, but is put into stark light when SSDs are involved. Parity computation is expensive, and you'll be hard pressed to get your I/O performance out of a R5 LUN created with SSDs.
I found 2 research papers about this topic:
-
Parity update increases write workload and space utilization
Introduction
[...] The results from our analytical model show that RAID5 is less reliable than striping with a small number of devices because of write amplification.
Conclusion
[...] Different factors such as the number of devices and the amount of data are explored, and the results imply that RAID5 is not universally beneficial in improving the reliability of SSD based systems
Source: Don’t Let RAID Raid the Lifetime of Your SSD Array
(Published 02/2012) -
Equal aging of all SSDs imposes risk of simultaneous failure (RAID1 & RAID6 affected too!)
Abstract
[...] Redundancy solutions such as RAID can potentially be used to protect against the high Bit Error Rate (BER) of aging SSDs. Unfortunately, such solutions wear out redundant devices at similar rates, inducing correlated failures as arrays age in unison. [...]
5. Simulation Results
[...] Conventional RAID-5 causes all SSDs age in lock-step fashion, and conventional RAID-4 does so with the data devices; as a result, the probability of data loss on an SSD failure climbs to almost 1 for both solutions as the array ages, and periodically resets to almost zero whenever all SSDs are replaced simultaneously. [...]
Source: Differential RAID: Rethinking RAID for SSD Reliability
(Published 03/2012)To protect from this the paper proposes a new RAID level called Diff-RAID that does automatically age-driven shuffling on device replacements).
You can protect from this by manually checking the SSD wear out indicator and replacing drives proactively with spare discs so that at no time multiple discs have the same critical age.