Hard drive issues - SpinRite vs. S.M.A.R.T

Solution 1:

If SpinRite is not reading the SMART parameters itself, then potentially lots.

The SMART (Self-Monitoring, Analysis, and Reporting Technology) values are a set of variables tracked by the drive itself concerning many things from general age related counts (time powered up, number of power ups in lifetime, ...), basic health monitoring (number of recoverable errors recovered from, length of spin-up time, number of sectors remapped due to repeated recoverable error, number of reserved blocks remaining for such remapping, current temperature, historic maximum temperature, ...) as well as explicit failure indicators (number of unrecoverable errors encountered, number of failures in past self-tests, ...). Most of these counters/flags have an associated benchmark above/below which the drive starts to consider itself on the way out.

While SpinRite has moved the data off those dodgy sectors and marked them in the filesystem so they do not get used again, the drive does not know about this. All it knows is that is has more unrecoverable errors in its saved state than it is happy with, and presumably other less serious indicators that its condition is bad and/or declining, and when the BIOS reads this it warns you.

The drive knows its own condition better than SpinRite does. I suggest you follow its warning and replaces it ASAP, in case the issue that caused the bad sectors worsens. It might not get any worse (there may have been a very small imperfection in the surface of one of the platters and everything else is fine), but if you have any data on there that you care about can you afford to take the risk?

One caveat: your BIOS may not be reading the SMART indicators correctly, but as you have actually seen bad sectors reported at the OS/application level I doubt this to be the case, so it might be worth grabbing some software to look at them yourself. There are many utilities available to scan and display SMART parameters from your drives - you might even find one specific to your manufacturer (which may include better descriptions of metrics that aren't common/standard - SMART allows for manufacturer/model specific metric to be stored and read) provided on their site.

Solution 2:

A SMART drive has lots of status indicators, some of which indicate imminent failure of the drive. Any drive that indicates SMART failure status should be replaced ASAP. You can of course continue to use the drive until it fails (possibly days or months in the future) but don't say you weren't warned.

The SMART system is not foolproof... I've only had advanced SMART warnings on two drives (out of about 10.) But both failed within two weeks of the SMART warning.

Solution 3:

S.M.A.R.T. knows about temperatures.
SpinRite knows about sectors.

So the disk is heating above the threshold that your S.M.A.R.T. software is set to detect as error. When I persistently had this problem during a particularly hot summer, I as solution reset that threshold to a higher temperature which was still well within the manufacturer's temperature range.

If this solution doesn't seem correct to you, or the disk temperature is dangerously close to the manufacturer's upper limit (I take that as within 10 degrees), then your disk is failing.

But I repeat, this is more likely a too-sensitive setting for your S.M.A.R.T. software. In any case, according to SpinRite, your disk hasn't started failing yet.

But don't skimp on your backups!

Solution 4:

I would trust SpinRite to some degree if it has checked and re-written the whole disk surface. But you should really use something like Smartmontools to find out which S.M.A.R.T. parameter is triggering the alert.

It may be that the amount of relocated bad blocks is too high or any of the other "pre-fail" of "old-age" indicators are too high. SpinRite can not reset these indicators, so the over-all S.M.A.R.T. state will keep complaining.

The state of the disk may be kept for some time if the thorough use by SpinRite did not find more errors, but you should keep running SpinRite, because it also refreshes blocks with bad-but-correctable ECC. Or just get a new disk ;-)