Does hard drive do read verification at the same time it writes data?

hard-drive

As we all know, modern hard disk drives do internal bad sector management inside their firmware, that is, when the drive detects a physically damanged or unreliable physical sector, it replaces the bad one with a good one from the reserved sector store. This mechanism eliminates the need for OS to do bad sector managment itself. Users can know the development of sector replacement count from S.M.A.R.T item called "reallocated sector count".

A sample HDD drive with too many reallocated sectors(the index 133 drops below threshold 140):

HDD SMART err1 (CrystalDiskInfo-6.5.2) HDD SMART err2

Now my question is: When does the HDD firmware determine a sector should be replaced. I imagine such ideal situation:

When HDD writes a sector, it does read verification immediately(verify CRC checksum and employ ECC to recover the soft error bits). In case the verification fails or the sector is deemed unreliable by the firmware, the firmware can immediately replace the sector so no user data is lost(at least not lost at the moment it is written to disk surface). This is a nice moment to do sector replacement because the to-be-written data now surely remains in hard drive's internal cache. The drive can try as many "hot spare" sectors as he will until he finds a good one.

If my idea is not the fact, then the HDD can only verify the written sector the next time user fetches it; but in case the next fetch fails CRC and ECC, and its corresponding cache has been discarded, we can do nothing but to know the previously written data has been permanently lost.

Which behavior is the truth? Is there any web articles clarifying this?

SSD disk should face the same situation, then what about SSD? SSD's case seems easier to understand, because I can easily imagine that SSD can do a quick read verification after write, very quickly; however, is such scheme practical for a spindle hard drive? It seems that the HHD head can do either read or write at one given time, so in order to verify the write, it has to wait for another spinning round until the written sector is positioned again under the head -- which would drastically slow down the data throughput. -- I'm so baffled.

Drives to perform a form of write-verification, and have been doing so for quite some time. Back in the olden days of loud drives, you could audibly tell when a drive was grinding on a bad block from the sound of it. I can't explain the exact mechanism, and I'm fairly sure the exact way has shifted a bit as we've moved through the various magnetic technologies. But the failure can be detected on write, and the sector will be pulled out of the reallocation pool.

It can also be detected on read, with the failure case you've already identified. Some of the higher-end RAID arrays (and of course ZFS) have background scanning routines to read storage during idle times specifically to locate these errors. In parity or mirrored RAID, you theoretically have a good copy elsewhere which makes recovery easy.

An SSD cell that fails to program will return a similar fail-state, and a fresh block will be pulled out of the reallocation pool. Worn-out SSD cells tend to go read-only, so the data is recoverable. Actual-broken cells is another story, same as a bit of grit landing on a drive-platter.

Maybe. The drive may or may not support the Write-Read-Verify feature set. If it is an ATA-like device, and hdparm is suitable for the drive, you can attempt to enable the Write-Read-Verify feature using the following command:

sudo hdparm -R1 /dev/sdc

assuming the drive is sdc. You can look at lsblk output to figure out which drive it is. -R0 to disable it.

If it tells you write-read-verify = 2 then the feature is enabled.

On one of my seagate 7200 rpm enterprise sata drives, it reduced sequential write performance to about 12MB/sec. You can expect drives with this feature enabled to possibly have severely degraded write performance when saturating the drive with writes.

A drive has complete freedom in the way this feature is implemented, the spec doesn't even insist a full byte-for-byte comparison, the drive is free to do as little as a CRC check if it likes, or even nothing, the check is implementation specific.

A drive may or may not enforce that a verify failure causes a write error. It is likely that a drive will implement it so a verify failure causes your write to error, since it seems more in line with the intent of the user that enabled this feature. If the drive wants your write successes to be iron clad, it will really verify before succeeding your write. This is what my Seagate drive appeared to be doing.

To verify a write, it has to wait for the disc to rotate again, and waits for the written data to go by the head again. Enforcing a wait-for-rotation each write severely degrades performance. At 7200 rpm, the platter is only rotating 120 times per second. You could be capped at 120 I/O per second, periodically at least.

The verification is made on every read, because the firmware reads the sector and its CRC/ECC and if it does not fit to the sector it will try to re-read (repair) it. The Wikipedia article Error detection and correction contains only one sentence:

Modern hard drives use CRC codes to detect and Reed–Solomon codes to correct minor errors in sector reads, and to recover data from sectors that have "gone bad" and store that data in the spare sectors.

But the reference contains a detailed explanation:

ECC structured redundancy up to 200 bits of 256/512 in a sector-CRC–Scrambled Bits- RLL adds bits to cause pulses and Parity

When data is written to the drive it is encoded. The actual data itself is never written, only the interpretation of the data. If you are thinking that a drive contains 0’s and 1’s then you are thinking about it incorrectly. The data is more like a wave form being written to the drive. It has to be interpreted back on its way out before it becomes a 0 or a 1. Before the data is written the data is randomized. This eliminates patterns that might be the same so that ECC is not confused. It is difficult to do pattern detection on a pattern that appears over and over. EMI can be reduced and have less effect on the bit storage and the timing controls.

The drive tries several different ways to re-read the data before giving up, most of them using ECC. It is possible for ECC to improperly correct data under certain circumstances if the data occurs in a certain order. ECC read commands use ODD numbering of at least 3 so as not to cause a 50/50 chance in the selection of 2.

Read ignoring ECC is an LBA 28 command “Read Long” and it was disabled in 48 bits as it was determined to be obsolete in drives over 137 gigs. No Read Ignore ECC is available after 137 gigs. Standard attempts are tried and usually are 10 tries in most hard drives. Reading a drive ignoring ECC can cause possible corruption in the data, but sometimes it is the only way to get the data in those sectors if there is a problem with the PCB or the ECC cannot read the data correctly.

If the Sector is determined to be unreadable by the ECC encoder then the sector is retried again. Reed Solomon in conjunction with sector rereads is expecting to fix data errors for the ECC. Parity bits are stripped off.

and:

After the data is written, a 4 byte block of ECC data is written. After the 512 bytes are read the drive will calculate the ECC Info and reads the ECC blocks of data and compares them. If they are not equal then the drive re-reads the data until timeout occurs causing the ECC data error. If it is not able to re-read and correct the error it will cause the UNC flag to state that the data in error is uncorrectable. It is possible to do a data recovery ignoring ECC but you will have no way to verify that the data read was correct. This should be done as the last phase to capture the data that could not be read any other way.

Therefor I conclude, no, the drive does not verify the data on write operations by doing an additional read. But it's not a full disaster as it is possible to repair the data through the ECC while reading (sometimes or most of the time?).

But finally we don't know what a "write retry" could be. Maybe only a misaligned head while writing, caused through vibrations? Or a physical damaged sector which can't be targeted at all?

Does hard drive do read verification at the same time it writes data?

Related

Recent Posts