Mac Hard Drive Testing Tools and SMART Analysis

How good is SMART (Self Monitoring Analysis and Reporting Technology) for really finding and predicting drive problems for regular hard drives and now, for SSDs.

From what I understand, different testing tools on the market report different levels of SMART levels. For example, Disk Utility, Scannerz, and SMARTReporter all seem to report what I would call a catastrophic failure level, meaning they don't report details, just, for lack of better words, "good", "maybe", and "failing" (please forgive my super high-tech terminology. :-) )

Others like smartmontools, TechTool Pro, and I think Drive Genius (not sure on that) can report a lot more detail with a lot of different parameters.

The reason I ask is this. I had a drive in my system and I was preparing to do a clean install on it of Mountain Lion. I have TechTools Pro and thought just for kicks I'd check out the SMART status on it to make sure everything was OK before reformatting the drive and doing the install. It came back and reported nothing wrong. In fact, it looked great.

I went ahead and started doing the install, and about 20 minutes into it the thing starts tapping. If you've ever seen a drummer do one of those rhythms where they're tapping on the drum rim instead the actual drum head, that's about what it sounded like. After maybe about 30 seconds of this the thing starts screeching like crazy. The install terminated. I put an old Snow Leopard install disc into the system and boot off the DVD, and now its saying the drive is unusable and that SMART status has failed.

If I checked this drive with extensive SMART capabilities just about 45 minutes before and it reported no problems, what good is SMART testing? Since I'm considering replacing the drive with an SSD, is SMART any better on that?

Thanks.


Solution 1:

SMART testing is of limited value in many cases, and cannot often predict failure of a hard drive. This is because different hard drive manufacturers implement different parts of the spec and (as you mentioned) different software and OS manufacturers look at different pieces of the spec in interpreting the SMART status.

Beyond that there are certain mechanical issues that can be difficult or impossible to predict. If your hard drive is tapping now, there are many things that could've caused the mechanical failure and it is hard to say if SMART had a way of knowing about it beforehand.

SSDs can and do support SMART. Rather than explaining, this Server Fault thread explains it quite well.

Solution 2:

SMART analysis is a REPORTING technology, not a testing technology. Tools like Scannerz, TechTool Pro, etc. are TESTING tools. There are some implementations of SMART that can perform testing on a drive, but this is vendor specific. SMART implementation also varies from vendor to vendor and the implementation isn't consistent. See the table in the following link, and it will tell you that some SMART parameters are vendor specific:

http://en.wikipedia.org/wiki/S.M.A.R.T.

The problem that I see with SMART testing is that it can report a problem which isn't lethal, but some SMART tools will portray the test results in a manner that, for lack of better words, "scares" a user into trashing what might be a drive with correctable problems.

For example, I had an old 100G drive in an Aluminum PowerBook about 4 years ago that experienced a head crash. TechTool Pro and another graphical SMART presentation tool (whose name eludes me right now) both presented graphs with some indicators pegged to the failure range. One looking at these would react exactly as I did and replace the drive. I ran a scan on the drive with both TechTools Pro and later with Scannerz (after it came out) and both indicated the same thing - numerous sectors in the 98G to 99G had experienced a severe head crash.

What I did was split the drive into two volumes, one ending right before the bad sectors began, and then another which encompassed the rest of the drive including the bad sectors. After partitioning it like this I then deleted the volume containing the bad sectors leaving me with a single volume. I installed this drive into an old 867MHz Titanium and have been using it ever since. It doesn't handle anything critical, I use it to read e-mails and check the weather in the morning, if I go on a trip, I'll take it along to watch DVDs in a motel room. I've been using it like this for years. I wouldn't use the drive for anything critical, but it's still of use. If I run TechTool Pro's SMART analysis on it, it still reports "imminent doom," but Scannerz, which like you said was a "threshold" system doesn't indicate that there's anything wrong with the drive. Which one is right? The drive is still in use. This is really a matter of interpreting SMART data, and the way it's being presented to users is generally "scary" when it need not be.

With all that said, your drive likely experienced a controller failure, and no application in the world could predict it properly. What caused it? Who knows. It could be a chip or other component on the drive's controller that failed, a transient could have burned something out on the controller, who know's. Predicting that is like expecting a weatherman to predict the exact time and location that a bolt of lightning will.

In your case, the failure caused the heads to move to an extreme location on the drive, start pounding against something else in the drive causing the "drumbeat" sound, and then break, thus causing the screeching sound. I think that if you check out some of the sites that do drive recovery they may have audio recordings that pretty much duplicate your problem.

Testing software can only do so much. It can't predict the unpredictable.