Issue

I have read many discussions about storage, and whether SSDs or classic HDDs are better. I am quite confused. HDDs are still quite preferred, but why?

Which is better for active storage? For example for databases, where the disk is active all the time?

About SSD.

Pros.

  • They are quiet.
  • Not mechanical.
  • Fastest.

Cons.

  • More expensive.

Question.

  • When the life cycle for one cell of a SSD is used, what happens then? Is the disk reduced by only this cell and works normally?
  • What is the best filesystem to write? Is ext4 good because it saves to cells consecutively?

About HDD.

Pros.

  • Cheaper.

Cons.

  • In case of mechanical fault, I believe there is usually no way to repair it. (Please confirm.)
  • Slowest, although I think HDD speed is usually sufficient for servers.

Is it just about price? Why are HDDs preferred? And are SSDs really useful for servers?


One aspect of my job is designing and building large-scale storage systems (often known as "SANs", or "Storage Area Networks"). Typically, we use a tiered approach with SSD's and HDD's combined.

That said, each one has specific benefits.

  1. SSD's almost always have a higher Cost-per-Byte. I can get 10k SAS 4kn HDD's with a cost-per-gigabyte of $0.068/GB USD. That means for roughly $280 I can get a 4TB drive. SSD's on the other hand typically have a cost-per-gigabyte in the 10's and 20's of cents, even as high as dollars-per-gigabyte.

  2. When dealing with RAID, speed becomes less important, and instead size and reliability matter much more. I can build a 12TB N+2 RAID system with HDD's far cheaper than SSD's. This is mostly due to point 1.

  3. When dealt with properly, HDD's are extremely cheap to replace and maintain. Because the cost-per-byte is lower, replacing an HDD with another due to failure is cheaper. And, because HDD failures are typically related to time vs. data-written, replacing it doesn't automatically start using up TBW when it rebuilds the RAID array. (Granted, TBW percentage used for a rebuild is tiny overall, but the point stands.)

  4. The SSD market is relatively complex. There are four (current, at the time of this writing) major types of SSD's, rated from highest number of total writes supported to lowest: SLC, MLC, TLC, QLC. The SLC typically supports the largest numbers of total writes (the major limiting factor of SSD lifetimes), whereas the QLC typically supports the lowest numbers of total writes.

That said, the most successful storage systems I've seen are tiered with both drives in use. Personally, all the storage systems I recommend to clients generally follow the following tiers:

  1. Tier 1 is typically a (or several) RAID 10 SSD-only tier. Data is always written to Tier 1.
  2. Tier 2 is typically a (or several) RAID 50 or 5 SSD-only tier. Data is aged out of Tier 1 to Tier 2.
  3. Tier 3 is typically a (or several) RAID 10 HDD-only tier. Data is aged out of Tier 2 to Tier 3.
  4. Tier 4 is typically several groups of RAID 6 HDD-only tiers. Data is aged out of Tier 3 to Tier 4. We make the RAID 6 groups as small as possible, so that there is a maximal support of drive-failure.

Read/Write performance drops as you increase tiers, data will propagate down to a tier where most of the data shares the same access-/modification-frequency. (That is, the more frequently data is read/written, the higher the tier it resides on.)

Sprinkle some well-designed fibre-channel in there, and you can actually build a SAN that has a higher throughput than on-board drives would.

Now, to some specific items you mention:

Your SSD Questions

How SSD exactly works, when life cycle for one cell is out, what then? Disk is reduced by only this cell and works normally? Or what happened then?

  • Both drive-types are typically designed with a number of "spare" cells. That is, they have "extra" space on them you cannot access that supports failing-to if a cell dies. (IIRC it's like 7-10%.) This means if a single "cell" (sector on HDD) dies, a "spare" is used. You can check the status of this via the S.M.A.R.T. diagnostics utility on both drives.

What is best solution (filesystem) to write? I think ext4 is good, because it saves to cells consecutively?

  • For SSD's this is entirely irrelevant. Cell-positioning does not matter, as access time is typically linear.

Your HDD Questions

In case of mechanical fault, no way to repair it (is it right)?

  • Partially incorrect. HDD's are actually easier to recover data from in most failure situations. (Note: I said easier, not easy.) There is specialized equipment required, but success-rates here seem pretty high. The platters can often be read out of the HDD itself by special equipment, which allows data-recovery if the drive is dead.

Slowest, but I think speed is not so important, because speed of HDD is absolutely sufficient for server using?

  • Typically, when using RAID, single-drive speed becomes less a factor as you can use speed-pairing RAID setups that allow you to increase the overall speed. (RAID 0, 5, 6 are frequently used, often in tandem.) For a database with high IO's, HDD's are typically not sufficient unless designed very deliberately. You would want SLC write-intensive grade SSD's for database-grade IO.

HDD is still quite preferred

Is it? I'm not sure it is to be honest.

HDD's come in large sizes for a decent price right now, that's undeniable, and I think people trust them for longer data retention than SSDs too. Also when SSDs die they tend to die completely, all in one go, whereas HDDs tend to die in a more predictable way that maybe allows more time to get data off first if needed.

But otherwise SSD is the way forward for most uses - you want a boot-pair, a couple of 500GB SATAs in R1 won't cost the earth, for DB use you can't really beat SSDs (so long as your logs are on high-endurance models anyway). For backups yeah you might use big 7.2k HDDs, same for very large datasets (in fact I bought over 4,000 10TB HDDs early last year for just this requirement), but otherwise SSD is the way forward.


Solid state for everything hot: interactive use, databases, anything online. Spindles as cheap warm storage, only for not-quite-cold archives or infrequently accessed data. In particular, HDDs in a staging area before backups are archived to tape.

Different media types for hot versus cold also helps with some diversity. A data loss flaw in a brand of SSD controller would be much worse if it took out both online and backup data. Unlikely, but spindles and tape are cheap anyway so why take the risk.

The failure mode of any particular device is not important, as long as the arrays stay redundant and backed up. Usually the procedure is to replace a drive with any symptoms of failure. Experiment with repairing them in your test systems, where any catastrophic failure does not impact production services.

File system is a matter of personal preference. While there are SSD optimized file systems, something you know and can repair may be more important.


The big advantage of an SSD is speed and reliability however, one of the dirty little secrets is the limited number of write cycles that an SSD has. If you are building a server that has a lot of hard drive write activity like a database or email server you will need a more expensive SSD that has higher endurance.

NAND Flash has 3 types

  • TLC
  • MLC
  • SLC

TLC is mainly designed for web servers or archive servers that have little write cycles. MLC is for servers that have a mix of read and write cycles like a low volume database servers. SLC is designed for servers that have a lot of read/write cycles like a high volume database server.

The main driving factor between SSD and HDD is application and budget. In a perfect world, SLC SSD hard drives would make a standard HDD obsolete but we are just not there yet.