Why are 4k reads in hdd/ssd benchmarks slower than writes?
There are several benchmark tools available to test the speed of a pc drive.
Here is an benchmark example of a SATA SSD:
- Sequential Read : 718.498 MB/s
- Sequential Write : 777.414 MB/s
- Random Read 512KB : 160.541 MB/s
- Random Write 512KB : 838.930 MB/s
- Random Read 4KB (QD=1) : 26.985 MB/s [ 6588.1 IOPS]
- Random Write 4KB (QD=1) : 135.603 MB/s [ 33106.2 IOPS]
- Random Read 4KB (QD=32) : 177.003 MB/s [ 43213.6 IOPS]
- Random Write 4KB (QD=32) : 178.397 MB/s [ 43554.0 IOPS]
m.2 SSD:
- Sequential Read (Q= 32,T= 1) : 829.119 MB/s
- Sequential Write (Q= 32,T= 1) : 677.645 MB/s
- Random Read 4KiB (Q= 32,T= 1) : 744.328 MB/s [181720.7 IOPS]
- Random Write 4KiB (Q= 32,T= 1) : 144.876 MB/s [ 35370.1 IOPS]
- Sequential Read (T= 1) : 785.600 MB/s
- Sequential Write (T= 1) : 789.973 MB/s
- Random Read 4KiB (Q= 1,T= 1) : 56.585 MB/s [ 13814.7 IOPS]
- Random Write 4KiB (Q= 1,T= 1) : 170.449 MB/s [ 41613.5 IOPS]
HDD:
- Sequential Read : 114.988 MB/s
- Sequential Write : 111.043 MB/s
- Random Read 512KB : 39.260 MB/s
- Random Write 512KB : 57.409 MB/s
- Random Read 4KB (QD=1) : 0.546 MB/s [ 133.4 IOPS]
- Random Write 4KB (QD=1) : 0.757 MB/s [ 184.9 IOPS]
- Random Read 4KB (QD=32) : 1.582 MB/s [ 386.3 IOPS]
- Random Write 4KB (QD=32) : 0.700 MB/s [ 171.0 IOPS]
In every case "Random Read 4KB Q1" is slower than write and in most cases its the opposite for "QD32".
In some forums people say its a limitation regarding the SSD chip-structure, but as usual hard drives show the same behaviour it seems to be an other reason?!
Solution 1:
TL;DR: It is because the SSD is lying to you and saying the write is done before it is. It can't get away with the same thing for reads.
The longer version of the answer is write caching.
Lets start with the QD1 case. The SSD will report the write as finished to the OS once it has received the data and saved it in a cache locally on the drive, but before it has actually written it to the NAND. This makes a big difference because actually writing data to NAND is quite slow. For reads it actually has to read the data from NAND before it can send it back (unless it has read it earlier and still has it in cache, but that is very unlikely with random reads).
The downside of this is that in the face of sudden power loss there can be data loss of data written to the SSD but which hasn't made it to the NAND yet. Some enterprise SSDs include a super capacitor which stores enough power to finish writing the data in cache to NAND in case of sudden power loss.
You see the same thing for hard drives because they are also doing write caching. They are just not being nearly as aggressive about it. Why is the SSD so aggressive? To answer that we need to move to consider the QD32 case, which is both more complicated and more interesting.
It is not true what you say that random reads are generally faster than random writes at QD32. It depends a lot on which particular SSDs you look at.
If you look at 4k QD1 random reads on many SATA SSDs they all seem to perform in the 20-30 MB/s range. Why is that? It is because 4k QD1 random reads is mostly about latencies and not throughput. The latency comes from three parts:
- The interface latency of SATA/AHCI which involves telling the drive what to do and sending the data.
- The controller itself has to figure out what to do with the data and instructions it has received.
- The time it takes to actually read or write the data to a NAND die.
Neither 1. or 3. changed much in a long time, and that is why the 1k QD1 random reads didn't change much either.
The recent move in SSDs from SATA/AHCI to PCIe/NVMe has greatly cut down the latency of 1., which is why certain m.2 and PCIe SSDs recently have show great improvements here.
One thing an SSD controller can do to greatly help with the latency is read or write to multiple NAND dies in parallel and that way mask most of the latency of 3. If you are doing QD32 4k random reads with NCQ the SSD can service the read requests out of order and make sure it is reading from as many NAND dies in parallel as possible.
For QD32 4k random writes the SSD does something called write combining. When a lot of small write requests comes in the SSD controller caches them locally and when a big enough buffer of writes has built up the controller splits it into nicely sized chunks and writes the chunks to multiple NAND dies in parallel, again to help mask the NAND latency. Another advantage of write combining is that most SSDs nowadays have a page size (smallest amount that can be read or written) bigger than 4k, and combining writes until you get up to the page size helps avoid lots of write amplification. It is in order to do these thing that SSDs are so aggressive in write caching.