Why is my HDD so slow on the "4K" speed tests?

What is wrong with my speed at 4K? Why is it so slow? Or is it supposed to be like that?

Screenshot of benchmark

Is that speed okay? Why do I have such low speed at 4K?

What you are running into is typical of mechanical HDDs, and one of the major benefits of SSDs: HDDs have terrible random access performance.

In CrystalDiskMark, "Seq" means sequential access while "4K" means random access (in chunks of 4kB at a time, because single bytes would be far too slow and unrealistic¹).

Definitions

There are, broadly, two different ways you might access a file.

Sequential access

Sequential access means you read or write the file more or less one byte after another. For example, if you're watching a video, you would load the video from beginning to end. If you're downloading a file, it gets downloaded and written to disk from beginning to end.

From the disk's perspective, it's seeing commands like "read block #1, read block #2, read block #3, read byte block #4"¹.

Random access

Random access means there's no obvious pattern to the reads or writes. This doesn't have to mean truly random; it really means "not sequential". For example, if you're starting lots of programs at once they'll need to read lots of files scattered around your drive.

From the drive's perspective, it's seeing commands like "read block #56, read block #5463, read block #14, read block #5"

Blocks

I've mentioned blocks a couple of times. Because computers deal with such large sizes (1 MB ~= 1000000 B), even sequential access is inefficient if you have to ask the drive for each individual byte - there's too much chatter. In practice, the operating system requests blocks of data from the disk at a time.

A block is just a range of bytes; for example, block #1 might be bytes #1-#512, block #2 might be bytes #513-#1024, etc. These blocks are either 512 Bytes or 4096 Bytes big, depending on the drive. But even after dealing with blocks rather than individual bytes, sequential block access is faster than random block access.

Performance

Sequential

Sequential access is generally faster than random access. This is because sequential access lets the operating system and the drive predict what will be needed next, and load up a large chunk in advance. If you've requested blocks "1, 2, 3, 4", the OS can guess you'll want "5, 6, 7, 8" next, so it tells the drive to read "1, 2, 3, 4, 5, 6, 7, 8" in one go. Similarly, the drive can read off the physical storage in one go, rather than "seek to 1, read 1,2,3,4, seek to 5, read 5,6,7,8".

Oh, I mentioned seeking to something. Mechanical HDDs have a very slow seek time because of how they're physically laid out: they consist of a number of heavy metalised disks spinning around, with physical arms moving back and forth to read the disk. Here is a video of an open HDD where you can see the spinning disks and moving arms.

Diagram of HDD internals
^{Image from http://www.realtechs.net/data%20recovery/process2.html}

This means that at any one time, only the bit of data under the head at the end of the arm can be read. The drive needs to wait for two things: it needs to wait for the arm to move to the right ring ("track") of the disk, and also needs to wait for the disk to spin around so the needed data is under the reading head. This is known as seeking². Both the spinning and the moving arms take physical time to move, and they can't be sped up by much without risking damage.

This typically takes a very very long time, far longer than the actual reading. We're talking >5ms just to get to where the requested byte lives, while the actual reading of the byte averages out to about 0.00000625ms per sequential byte read (or 0.003125ms per 512 B block).

Random

Random access, on the other hand, don't have that benefit of predictability. So if you want to read 8 random bytes, maybe from blocks "8,34,76,996,112,644,888,341", the drive needs to go "seek to 8, read 8, seek to34, read 34, seek to76, read 76, ...". Notice how it needs seek again for every single block? Instead of an average of 0.003125ms per sequential 512 B block, it's now an average of (5ms seek + 0.003125ms read) = 5.003125ms per block. That's many, many times slower. Thousands of times slower, in fact.

SSDs

Fortunately, we have a solution now: SSDs.

A SSD, a solid state drive, is, as its name implies, solid state. That means it has no moving parts. More, the way a SSD is laid out means there is (effectively³) no need to look up the location of a byte; it already knows. That's why a SSD has much less of a performance gap between sequential and random access.

There still is a gap, but that can be largely attributed to not being able to predict what comes next and preloading that data before it's asked for.

¹ More accurately, with LBA drives are addressed in blocks of 512 bytes (512n/512e) or 4kB (4Kn) for efficiency reasons. Also, real programs almost never need just a single byte at a time.

² Technically, seek only refers to the arm travel. The waiting for the data to rotate under the head is rotational latency on top of the seek time.

³ Technically, they do have lookup tables and remap for other reasons, e.g. wear levelling, but these are completely negligible compared to a HDD...

As already pointed out by other answers, "4K" almost certainly refers to random access in blocks of size 4 KiB.

Every time a hard disk (not a SSD) is asked to read or write data, there are two significant delays involved:

Seek latency, for the read/write head to "seek" to the correct circular track (or "cylinder") on the platter, including any time required for the head to stabilize over the track and synchronize against the data stored on the platter
Rotational latency, for the spinning platter underneath the read/write head to rotate such that the desired portion of the track (the "sector") passes under the head

Both of these are of a relatively constant amount of time for any given drive. Seek latency is a function of how fast the head can be moved and how far it needs to be moved, and rotational latency is a function of how fast the platter is spinning. What's more, they haven't changed much over the last few decades. Manufacturers actually used to use average seek times e.g. in advertisements; they pretty much stopped doing that when there was little or no development in the area. No manufacturer, especially in a high-competition environment, wants their products to look no better than those of their competitors.

A typical desktop hard disk spins at 7200 rpm, whereas a typical laptop drive might spin at around 5000 rpm. This means that each second, it goes through a total of 120 revolutions (desktop drive) or about 83 revolutions (laptop drive). Since on average the disk will need to spin half a revolution before the desired sector passes under the head, this means that we can expect the disk to be able to service approximately twice that many I/O requests per second, assuming that

either the seek is done while the disk is rotating (this is probably a safe bet for hard disks today where I/O involves seeking), and the seek latency is no longer than the rotational latency for the particular I/O
or the head happens to be over the correct cylinder already, causing the drive to not need to seek (which is a special case of the above, with a seek latency of zero)

So we should expect to be able to perform on the order of 200 I/O per second if the data it is being asked to access (for reading or writing) is relatively localized physically, resulting in rotational latency being the limiting factor. In the general case, we would expect the drive to be able to perform on the order of 100 I/O per second if the data is spread out across the platter or platters, requiring considerable seeking and causing the seek latency to be the limiting factor. In storage terms, this is the "IOPS performance" of the hard disk; this, not sequential I/O performance, is typically the limiting factor in real-world storage systems. (This is a big reason why SSDs are so much faster to use: they eliminate the rotational latency, and vastly reduce the seek latency, as the physical movement of the read/write head becomes a table lookup in the flash mapping layer tables, which are stored electronically.)

Writes are typically slower when there is a cache flush involved. Normally operating systems and hard disks try to reorder random writes to turn random I/O into sequential I/O where possible, to improve performance. If there is an explicit cache flush or write barrier, this optimization is eliminated for the purpose of ensuring that the state of the data in persistent storage is consistent with what software expects. Basically the same reasoning applies during reading when there is no disk cache involved, either because none exists (uncommon today on desktop-style systems) or because the software deliberately bypasses it (which is often done when measuring I/O performance). Both of those reduce the maximum potential IOPS performance to that of the more pessimistic case, or 120 IOPS for a 7200 rpm drive.

At 100 IOPS at 4 KiB per I/O, we get a performance of about 400 KB/s.
At 200 IOPS at 4 KiB per I/O, we get a performance of about 800 KB/s.

Which just so happen to match your numbers almost exactly. Random I/O with small block sizes is an absolute performance killer for rotational hard disks, which is also why it's a relevant metric.

As for purely sequential I/O, throughput in the range of 150 MB/s isn't at all unreasonable for modern rotational hard disks. But very little real-world I/O is strictly sequential, so in most situations, purely sequential I/O performance becomes more of an academic exercise than an indication of real-world performance.

4K refers to random I/O. This means the disk is being asked to access small blocks (4 KB in size) at random points within the test file. This is a weakness of hard drives; the ability to access data across different regions of the disk is limited by the speed at which the disk is rotating and how quickly the read-write heads can move. Sequential I/O, where consecutive blocks are being accessed, are much easier because the drive can simply read or write the blocks as the disk is spinning.

A solid-state drive (SSD) has no such problem with random I/O as all it needs to do is look up where the data is stored in the underlying memory (typically NAND flash, can be 3D XPoint or even DRAM) and read or write the data at the appropriate location. SSDs are entirely electronic and do not need to wait on a rotating disk or a moving read-write head to access data, which makes them much faster than hard drives in this regard. It is for this reason that upgrading to an SSD dramatically increases system performance.

Side note: sequential I/O performance on an SSD is often much higher than on a hard drive as well. A typical SSD has several NAND chips connected in parallel to the flash memory controller, and can access them simultaneously. By spreading data across these chips, a drive layout akin to RAID 0 is achieved, which greatly increases performance. (Note that many newer drives, especially cheaper ones, use a type of NAND called TLC NAND which tends to be slow when writing data. Drives with TLC NAND often use a small buffer of faster NAND to provide higher performance for smaller write operations but can slow down dramatically once that buffer is full.)