what is exactly an URE?

Solution 1:

A URE is an Unrecoverable Read Error. Something has happened that has caused the reading of a sector to fail that the drive cannot fix. The drive electronics are sophisticated, they will only pass the data up if they have been able to read it correctly from the disk. The drive electronics will try multiple times to read a bad sector before declaring it damaged.

What causes the read error - I'm not an expert here (arm waving ensues) but drive aging can cause manufacturing tolerances to become relevant. Magnetic domains can become weakened. Cosmic rays can cause damage etc. Essentially it is a random failure.

How does this affect RAID 5?

A RAID 5 consists of block level striping with distributed parity. The parity blocks are calculated by XORing the bits from the data blocks together. The XOR function basically says, if all the bits are the same the result is 0 otherwise it is 1. When calculating parity you take the first 2 bits and XOR them then XOR the result with the next bit and so on e.g.

1010   data      or    1010 data
1100   data            1100 data
0110   parity          0011 data
                       0101 parity

The nature of the XOR function is such that if any disk dies and is replaced, the data that should be on it can be reconstructed from the remaining disks.

1010  data       or    1010 data
      damaged               damaged
0101  parity           0011 data
                       0101 parity

As you can see the damaged data can be reconstructed by XORing the remaining data and parity.

How does a URE affect this?

A URE is only significant during a RAID 5 rebuild.

When you reconstruct a RAID 5 there is a large amount of reading to be done. Every data block needs to be read in order to reconstruct the data on the new disk. If a URE occurs then the data for the relevant block cannot be recovered so your data is inconsistent. For sufficiently large disks in a sufficiently large R5 the number of bits read to reconstruct the replaced disk exceeds the URE value of for example 1 bit in 10^14 read.

Solution 2:

So what exactly is an URE, I mean concretely?

Hard disks do not simply store the data that you ask them to. Because of the ever-decreasing magnetic domain sizes, and the fact that hard disks store data in an analog rather than binary fashion (the hard disk firmware gets an analog signal from the platter, which is translated into a binary signal, and this translation is part of the manufacturer's secret sauce), there is virtually always some degree of error in a read, which must be compensated for.

To ensure that data can be read back, the hard disk also stores forward error correction data along with the data you asked it to store.

Under normal operations, the FEC data is sufficient to correct the errors in the signal that is read back from the platter. The firmware can then reconstruct the original data, and all is well. This is a recoverable read error which is exposed in SMART as the read error rate attribute (SMART attribute 0x01) and/or Hardware ECC Recovered (SMART attribute 0xc3).

If for some reason the signal degrades below a certain point, the FEC data is no longer sufficient to reconstruct the original data. At that point, the theory goes, the firmware will still be able to detect that the data could not be read back reliably, but it can't do anything about it. If multiple such reads fail, the disk has to somehow inform the rest of the computer that the read couldn't be performed successfully. It does so by signalling an unrecoverable read error. This also increases the Reported Uncorrectable Errors (SMART attribute 0xbb) counter.

An unrecoverable read error, or URE, is simply a report that for whatever reason, the payload data plus the FEC data was insufficient to reconstruct the originally stored data.

Keep in mind that URE rates are statistical. You won't encounter any hard disk where you can read exactly 10^14 (or 10^15) - 1 bits successfully and then the next bit fails. Rather, it's a statement by the manufacturer that on average, if you read (say) 10^14 bits, then at some point during that process you will encounter one unreadable sector.

Also, following on the last few words above, keep in mind that URE rates are given in terms of sectors per bits read. Because of how data is stored on the platters, the disk cannot tell which part of a sector is bad, so if a sector fails the FEC check, then the entire sector is considered to be bad.

Solution 3:

the sector dies : as well totally unrecoverable, but here I do not understand why the 4TB disk is rated at 10^14 for the URE and the 8TB is as well rated at 10^14 for the URE, that would mean the sectors on the 8TB (most likely newer tech) are half as reliable as the ones on the 4TB, that does not make sense.

The specification is usually "on average 1 error is detected while reading n bits", so the drive size does not matter. It matters if you calculate your risk that an error will happen on your drive and workload, but the manufacturer only states that it takes n bits read to find an error (on average, not guaranteed).

Example: If you buy a 1TB drive, you would have to read it about 12 times to find an error, while an 8TB drive might experience it on the second read - but the number of bits read is the same both times, so the quality of the magnetic spindles is roughly the same.

What you pay for in increased price are other factors, ability to cram 8TB into the physical space of 1TB, greatly reduced energy consumption, fewer headcrashes while moving the drive etc.

what is exactly an URE?

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts