Should I be concerned about a high SMART Hardware_ECC_Recovered value?

According to Steve Gibson of Spinrite fame, SMART values have to be taken over time, not as instantaneous readings. That means, a value of 47 isn't necessarily bad if the value has been 47 for months. However if the value was 42 an hour ago, and its climbing rapidly, then that means the drive is experiencing difficulty accessing part of the data and may soon be unable to read the sector at all. Depending on the value of the data on that drive you may wish to replace it.

A high value for this attribute is actually pretty good:

Hardware ECC Recovered S.M.A.R.T. parameter indicates time between ECC-corrected errors.

https://kb.acronis.com/content/9131

First, lower values are worse for SMART, not higher values (notice how the threshold column is always lower than the current value). So, a value increasing is no cause for worry. (This rule does not apply to the raw values, however.)

SMART values tend to oscillate a bit (yours might be in the edge between 46 and 47, for instance, so even small changes could cause it to flip to the other value).

Your smartctl -a output shows the worst this value has been is 45, so it oscilating slightly above it is normal.

For more information, take a look at Wikipedia: ATA S.M.A.R.T. attributes.

Please Note that the "Lower are worse" only applies to the values in the three columns labeled "Value", "Thresh" and "Worst". And not necessarily applicable to the "Raw Value", as values there are not normalised by that metric.

Keep in mind that even the extensive study that Google conducted found that a large number of drive failures were not predicted by SMART errors. It's possible what you see is perfectly normal, but as each manufacturer has different metrics for converting the raw values into the reported values it is hard to say for sure if your drive is experiancing a lot of errors or not. However, a raw number that large does strike me as odd.

I would recommend reading all of the drive (dd or rsync'ing to a new drive) and check the SMART values as it goes along. If you see that raw number, or the reported values, change a lot I'd start looking to replace the drive.

IIRC Hardware ECC recovered is error correction on disk reads, which isn't unusual for a disk, and they encode the data with error correction mechanisms for precisely this reason. Some controllers also support redundant information in disk sectors and add another layer of error correction.

As Dave Cheney states the figures should be monitored over time. Radical changes in these statistics are an indication of a failing drive. Also, keep an eye on grown defect lists - if the grown defect list starts to grow or the SMART statistics start to change significantly then you should prophylactically replace the drive.

Should I be concerned about a high SMART Hardware_ECC_Recovered value?

Related

Recent Posts