Can "enterprise" drives be safely replaced by near/midline in some situations?

When specifying servers, like (I would assume) many engineers who aren't experts in storage, I'll generally play it safe (and perhaps be a slave to marketing) by standardising on a minimum of 10k SAS drives (and therefore are "enterprise"-grade with a 24x7 duty cycle, etc) for "system" data (usually OS and sometimes apps), and reserve the use of 7.2k mid/nearline drives for storage of non-system data where performance isn't a significant factor. This is all assuming 2.5" (SFF) disks, as 3.5" (LFF) disks are only really relevant for high-capacity, low IOPs requirements.

In situations where there isn't a massive amount of non-system data, I'll generally place it on the same disks/array as the system data, meaning the server only has 10k SAS drives (generally a "One Big RAID10" type of setup these days). Only if the size of the non-system data is significant do I usually consider putting it on a separate array of 7.2k mid/nearline disks to keep the cost/GB down.

This has lead me to wonder: in some situations, could those 10k disks in the RAID10 array have been replaced with 7.2k disks without any significant negative consequences? In other words, am I sometimes over-spec'ing (and keeping the hardware vendors happy) by sticking to a minimum of 10k "enterprise" grade disks, or is there a good reason to always stick to that as a minimum?

For example, take a server that acts as a hypervisor with a couple of VMs for a typical small company (say 50 users). The company has average I/O patterns with no special requirements. Typical 9-5, Mon-Fri office, with backups running for a couple of hours a night. The VMs could perhaps be a DC and a file/print/app server. The server has a RAID10 array with 6 disks to store all the data (system and non-system data). To my non-expert eye, it looks as though mid/nearline disks may do just fine. Taking HP disks as an example:

Workload: Midline disks are rated for <40% workload. With the office only open for 9 hours a day and average I/O during that period unlikely to be anywhere near maximum, it seems unlikely workload would go over 40%. Even with a couple of hours of intense I/O at night for backups, my guess is it would still be below 40%
Speed: Although the disks are only 7.2k, performance is improved by spreading it across six disks

So, my question: is it sensible to stick a minimum of 10k SAS drives, or are 7.2k midline/nearline disks actually more than adequate in many situations? If so, how do I gauge where the line is and avoid being a slave to ignorance by playing it safe?

My experience is mostly with HP servers, so the above may have a bit an HP slant to it, but I would assume the principles are fairly vendor independent.

There's an interesting intersection of server design, disk technology and economics here:

Also see: Why are Large Form Factor (LFF) disks still fairly prevalent?

The move toward dense rackmount and small form-factor servers. E.g. you don't see many tower offerings anymore from the major manufacturers, whereas the denser product lines enjoy more frequent revisions and have more options/availability.
Stagnation in 3.5" enterprise (15k) disk development - 600GB 15k 3.5" is about as large as you can go.
Slow advancement in 2.5" near line (7.2k) disk capacities - 2TB is the largest you'll find there.
Increased availability and lower pricing of high capacity SSDs.
Storage consolidation onto shared storage. Single-server workloads that require high capacity can sometimes be serviced via SAN.
The maturation of all-flash and hybrid storage arrays, plus the influx of storage startups.

The above are why you generally find manufacturers focusing on 1U/2U servers with 8-24 2.5" disk drive bays.

3.5" disks are for low-IOPs high-capacity use cases (2TB+). They're best for external storage enclosures or SAN storage fronted by some form of caching. In enterprise 15k RPM speeds, they are only available up to 600GB.

2.5" 10k RPM spinning disks are for higher IOPS needs and are generally available up to 1.8TB capacity.

2.5" 7.2k RPM spinning disks are a bad call because they offer neither capacity, performance, longevity nor price advantages. E.g. The cost of a 900GB SAS 10k drive is very close to that of a 1TB 7.2k RPM SAS. Given the small price difference, the 900GB drive is the better buy. In the example of 1.8TB 10k SAS versus 2.0TB 7.2k SAS, the prices are also very close. The warranties are 3-year and 1-year, respectively.

So for servers and 2.5" internal storage, use SSD or 10k. If you need capacity needs and have 3.5" drive bays available internally or externally, use 7.2k RPM.

For the use cases you've described, you're not over-configuring the servers. If they have 2.5" drive bays, you should really just be using 10k SAS or SSD. The midline disks are a lose on performance, capacity, have a significantly shorter warranty and won't save much on cost.

There are at least a few things that could cause problems with SOME drive types:

Drives that are not meant to deal with the vibration load of a chassis having many drives (unlikely problem with any drive specified as RAID/NAS-capable)
Firmware that does not allow TLER, or needs time-consuming manual reconfiguration of the drive to enable it (ditto)
Drives that have never been tested with the RAID controller used, and might have unrecognized bugs that surface in such a setup
Internal drive write caches that behave in a way (physical writing is out of order or very delayed) that causes a lot of confusion in case of a hard shutdown (RAID controller should be configured to force these OFF. Potential problem if firmware should ever ignore that. See untested drives :)
Drive might do internal maintenance routines occasionally that could make the drive behave slowly, or respond with enough delay, to make the RAID controller think it failed (related to TLER)
SATA in general, as it is usually implemented, has less safeguards compared to SAS against a drive with completely shot or hung electronics hanging everything on the controller (not a theoretical risk, certain disk+controller brand combinations love that failure mode).

HUGE issue:

(May be a teeny bit off-topic - but I'ts imporant!)

When you are dealing with SSDs - (as is often the case, or may be either the case or temptation) - a lot of SSDs have a nasty problem where they cannot always recover from spontaneous power outages!

This is a tiny problem with HDDs. HDDs usually have enough capacitance to power their logic and enough angular momentum to carry the platters through finishing off writing a 512-byte block - in the event that power is lost mid-write. Once in a rare while, this will not work, resulting in something called a "torn write" - where a single block may be partially written. The partial write (albiet rare) will cause a checksum failure on the block - i.e. that individual block will be bad. This can usually be detected as bad by the disk circuitry itself, and corrected by the upstream RAID controller.

SSDs are a different animal. The usually implement something called "wear leveling" - where they don't just write "block X" to a physical location for "block X" like a HDD does. Instead, they try to write to difference places on the flash media - and they try to aggregate or combined writes (using a bit of buffering). Writing to the different places involves keeping a "map" of where things are written, which is also buffered and written out in a manner meant to reduce wear leveling. Part of the wear leveling even can involve moving data that's already on the device and hasn't even been recently written.

This problem is that when the SSD loses power - it has a lot of data in memory (unflushed), it has some data that has been written out to different/changed locations - and it has these maps in it's own memory which need to be flushed out to make any sense of the strucuture of all the data on the device.

MANY SSDs do not have the logic or circuitry to be able to keep their controllers up and alive long enough on spontaneous-power-out to safely flush all this data to flash before it dies. This doesn't just mean that that one block you wrote could now be in jeprody - but other blocks - even all the blocks on the device could be in trouble. Many devices also have problems where they not only lose all the data on the device, but the device itself becomes bricked, and unusable.

This is all true theory - but (working in the storage industry) - I/we have seen this happen way too many times on way too many devices - including in some of our own, personal laptops!

Many vendors have discussed making "enterprise grade SSDs" where the specifically add devices ("super-caps") and other circuitry to allow a clean "flush" - but it's very very hard to find any device which specifically states as a part of it's datasheet that it has sufficient, explicit, tested protection from such events and will protect against such events.

Obviously if you buy a "high end storage array" from a top-tier vendor which utilized flash technology, either their drives - or their system on-whole has been designed with all this in account. Make sure it has!

The problem with respect to your question is: If you have a RAID array - and several of the disks are the "bad" SSDs without this protection - if you get a "spontaneous power outage" - you could lose ALL the data on MULTIPLE disks rendering RAID reconstruction impossible.

"But I use a UPS"

It is also generally important to note that "spontaneous power outage" can include situations like BSOD and kernel locks/crashes/panics - where you have no choice of recover by to pull the plug on the system.

Can "enterprise" drives be safely replaced by near/midline in some situations?

Related

Recent Posts