What is the current state (2016) of SSDs in RAID?
There are plenty of resources available online that discuss using SSD drives in RAID configurations - however these mostly date back a few years, and the SSD ecosystem is very fast-moving - right as we're expecting Intel's "Optane" product release later this year which will change everything... again.
I'll preface my question by affirming there is a qualitative difference between consumer-grade SSDs (e.g. Intel 535) and datacenter-grade SSDs (e.g. Intel DC S3700).
My primary concern relates to TRIM
support in RAID scenarios. To my understanding, despite it being over 6 years since SSDs were introduced in consumer-grade computers and 4 years since NVMe was commercially available - modern-day RAID controllers still do not support issuing TRIM
commands to attached SSDs - with the exception of Intel's RAID controllers in RAID-0 mode.
I'm surprised that TRIM
support is not present in RAID-1 mode, given the way drives mirror each other, it seems straightforward. But I digress.
I note that if you want fault-tolerance with disks (both HDD and SSD), you would use them in a RAID configuration - but as the SSDs would be without TRIM it means they would suffer Write-Amplification which results in extra wear, which in turn would cause SSDs to fail prematurely - this is an unfortunate irony: a system designed to protect against drive failure might end-up directly resulting in it.
So:
-
Is
TRIM
support necessary for modern (2015-2016 era) SSDs?1.1. Is there any difference in the need for
TRIM
support between SATA, SATA-Express, and NVMe-based SSDs? -
Often drives are advertised as having improved built-in garbage-collection; does that obviate the need for
TRIM
? How does their GC process work in RAID environments?1.1. For example, see this QA from 2010 which describes pretty-bad performance degradation due to not-TRIMming ( https://superuser.com/questions/188985/how-badly-do-ssds-degrade-without-trim ) - and this article from 2015 makes the case that using TRIM is strongly recommended ( http://arstechnica.com/gadgets/2015/04/ask-ars-my-ssd-does-garbage-collection-so-i-dont-need-trim-right/ ). What is your response to these strong arguments for the necessity of
TRIM
? -
A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan - however it seems all SSDs today (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore?
1.1 And what about TLC flash?
Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan) - if their write-cycle limit is very high (e.g. 100 complete writes per day) does this mean that they don't need
TRIM
at all because those limits are so high, or - the opposite - are those limits only attainable by usingTRIM
?
Solution 1:
Let's try to reply one question at a time:
- Is TRIM support necessary for modern (2015-2016 era) SSDs?
Short answer: in most cases, no. Long answer: if you reserve sufficient spare space (~20%), even consumer-grade drive usually have quite good performance consistency values (but you need to avoid the drives which, instead, choke on sustained writes). Enterprise-grade drives are even better, both because they have higher spare space by default and because their controller/firmware combo is optimized toward continuous use of the drive. For example, take a look at the S3700 drive you referenced: even without trimming, it has very good write consistency.
- Often drives are advertised as having improved built-in garbage-collection, does that obviate the need for TRIM? How does their GC process work in RAID environments
The drive garbage collector does its magic inside the drive sandbox - it does not know anything about the outside environment. This means that it is (mostly) unaffected by the RAID level of the array. That said, some RAID levels (the parity-based one, basically) can sometimes (and in some specific implementation) increase the write amplification factor, so this in turn means higher work for the GC routines.
- A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan, however it seems all SSDs (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore
SLC drives have basically disappeared from the enterprise, being relegated mainly to military and some industrial tasks. The enterprise marked is now divided in three grades:
- HMLC/MLCe flash is the one with the better binned MLC chips, and certified to sustain at least 25000/30000 rewrite cycles;
- 3D MLC chips are rated at about 5000-10000 rewrite cycles;
- normal planar MLC and 3D TLC chips are rated at about 3000 rewrite cycles.
In reality, any of the above flash types should provide you with plenty of total write capacity and, in fact, you can find enterprise drives with all of the above flash types.
The real differentiation between enterprise and consumer drives are:
- the controller/firmware combo, with enterprise drives much harder to die due to unexpected controller bug;
- the power-protected write cache, extremely important to prevent corruptions to the Flash Translation Layer (FTL), which is stored on the flash itself.
Enterprise grade drivers are better mostly due to their controllers and power capacitors, rather than due to better flash.
- Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan), does this obviate any concerns over Write-Amplification caused by not running TRIM?
As stated above, enterprise grade drives have much higher default spare space (~20%) which, in turn, drastically lowers the need for regular TRIMs
Anyway, as a side note, please consider some software RAIDs that support TRIMs (someone said Linux MDRAID?)
Solution 2:
TRIM isn't something I ever worry about when using SSDs on modern RAID controllers. The SSDs have improved, hardware RAID controller features have been optimized for these workloads, and endurance reporting is usually in place.
TRIM is for lower end SATA drives. For SAS SSDs, we have SCSI unmap, and perhaps that's the reason I don't encounter TRIM needs...
But the other commenter is correct. Software-Defined Storage (SDS) is changing how we use SSDs. In SDS solutions, RAID controllers are irrelevant. And things like TRIM tend to be less important because SSDs are filling specified roles. I think of Nimble storage read cache or the ZFS L2ARC and ZIL... They all meet specific needs and the software is leveraging the resources more intelligently.