How to evaluate SSD & HDD time to failure (remaining lifespan) and can their health be restored?

I'd like to calculate the longevity of my SSD and mechanical hard drives as of right now.

The calculated remaining lifespan would help me to take timely measures such as increasing backup frequency. I'm mainly interested in the remaining lifespan of backup hard drives that are not in use or only in use when the backup is refreshed.

So far, I found four indicators of remaining time to failure (or ways to estimate it):

  • The product's warranty in years or "about 5 years" for HDDs in general according to some (or even most?) articles about the subject
  • The S.M.A.R.T. disk self-scan (via the GNOME Disk Utility on GNU/Linux/Debian10)
  • The TBW (Terabytes Written) of the disks (like this for Linux) in comparison with the product's maximum TBW to failure or warranty in maximum TBW
  • The product's "Mean Time Between Failures" (MTBF) in hours.

However, I'd like to take into account that the backup drives are not or mostly not in use and these ways don't really consider that. Furthermore, the S.M.A.R.T. health check is said to be very unreliable for commons HDDs. And I couldn't find information about the estimated maximum TBW for my HDD (only for the SSD). The MTBF is also said to be unreliable, I'm not sure if this only applies to disks being used 24/7 instead of also to those mostly inactive and it seems to be far far longer than anything close to the "about 5 years" so I'm unsure to how relevant it is. Lastly, I don't know how to use these ways in combination or which way would be the most reliable in which case (e.g. HDD not in use vs SSD in use etc).


→ Are there more ways to calculate the remaining lifespan of both active and inactive HDDs and SSDs? Could you address my concerns with these 4 ways to check (e.g. how reliable is MTBF)?


I'm asking because the sooner drives fail the more e-waste there is. This causes pollution and wastes minerals. I'm also asking because I'd like to check that my backups are fine – e.g. that the files storing the backups won't fail during nonuse and when my drives in use can be expected to fail. Some of the indicators, for which there may be standards, may also be compared before purchase so that only long-lasting drives are manufactured. There may even be standards for the use of multiple ways to calculate longevity in combination.

This might be a separate question but may also be part of this Q/A which is why I'm attaching it here: is there a way to restore health of a hard drive after failure, such as removing bad sectors or "rewriting" some data or even some physical operation on the drive? (Maybe even software that continuously checks which sectors might fail soon and moving the data before failure?)


Solution 1:

"Are there more ways to calculate the remaining lifespan of both active and inactive HDDs and SSDs? Could you address my concerns with these 4 ways to check (e.g. how reliable is MTBF)?" - No - at least not reliably. Hard drive failures are not very predictable - and SMART is about as good as it gets, which is to say not very good at predicting failures.

The SMART values for wear levelling indication on SSDs is a lot more reliable and predictable.

Depending on your drive, Backblaze publish their reliability characteristics for drives they use, which might give you a somewhat-less-then-meaningless data point to consider for your model drives if they happen to use them, but it says nothing about disks that are mostly offline.