Extreme drops in hard disk performance

Solution 1:

The Seagate ST4000DM004 uses SMR to write data to the disk surface. This means, that in order to write a single byte, it might have to rewrite multiple gigabytes.

In "normal usage patterns" (as designated so by HDD vendors, not by users!) this creates not much of a problem - the data is written to a CMR cache on the outer rim of the disk. Later, when disk usage goes down, the firmware will move the date to its final place in an SMR band.

When writing larger quantities of data at a time, this CMR cache is exhausted and the process of I/O to SMR bands has to take over - this is slower by orders of magnitude.

Nota bene: This is not a RAM cache - it is a small part of the disk surface, that is written in CMR (i.e., without overlapping tracks) to make the SMR horror less visible to users.

Solution 2:

Hard drives write data in sectors on tracks, however there is a limit to how close together tracks could be placed without interfering with each other.

Hard drive vendors realized that the problem of adjacent tracks interfering with each other could be mitigated if they gave up on the traditional random write access model and wrote large areas of data sequentially. Each track written would overlap slightly with the last. That means more data per platter which means higher capacity and/or lower cost. This is known as "Shingled Magnetic Recording" (SMR), by analogy to the way roofing shingled overlap.

Of course, that a hard drive that required major changes in the OS wouldn't sell very well. So they added translation firmware and a CMR cache area, so that the SMR drive would look like a regular drive to the OS. It is not terribly dissimilar to what SSD vendors already do.

The difference is though that flash is fast, so even with the translation layer, SSDs were still much faster than HDDs. SMR HDDs on the other hand have performance that drops off a cliff when the CMR cache area runs out and the drive must block new write operations on the slow process of rewriting shingles.

Unfortunately, all three of the remaining HDD vendors decided that the way they would release this technology is by slipping it into the product lineup without telling people about it. So rather than being able to make a conscious choice whether or not to accept a performance cliff in exchange for a slightly lower cost per unit of storage, people unknowingly received these drives. Under pressure from the media, they did eventually release the information on which drive models were SMR, but it's still not made obvious to customers.

Since it was all three of the major HDD vendors who did this, you can't just boycott the culprits, so it seems the only option is to carefully check every hard drive you buy from now on.

Curiously, despite the original motivation behind SMR being capacity, it seems the largest drives were often still CMR with SMR being mostly seen on drives in the low single digit terabytes.