Confirm disk is broken when it passes all diagnostics

Solution 1:

You can't reliably.

Or rather, you have already done it with the options at your disposal.

As a study at google found out, failing disks do not necessarily show abnormal SMART values (The other way round however is more reliable: when they do, they will fail).

Keeping this aside for a moment, bear in mind that even though alot is standardized in computing, in reality there are bugs in both hard- and software, error margins which can accumulate, etc. The real world isn't perfect, and it's not unseen of hard disks not playing nice with particular controllers - and the other way round. Sometimes it's a question of a faulty firmware, sometimes some completely different system components not behaving, for example a sub-par PSU which barfs at particular load spikes. Or even temperature changes, age...the list could be expanded almost at will.

So, standard procedure here is to put the disk into a significantly different system configuration and re-run tests - but since you already have done so with the complete change of your system, you have correctly concluded that the disk must be at fault. (Unless you did not change everything else as you've told us - Cable/HBA comes to mind, in which case the assumption would not hold true).

Edit: I just realized that there is one option left; you can search if there are newer firmware revisions available for this disk drive than what's currently on your particular drive. If so, you may have a look at the change log pointing out possible problems in your case.

In conclusion, to establish with complete confidence (in this particular situation!) that the drive is misbehaving, you'll need to send it back to the manufacturer.

Solution 2:

Im thinking this is a bad controller. You can do a few more things to check out the disk as well as the controller...


Run 'badblocks' on the drive. This is similar to the 'dd' that you ran. Take another drive that has good SMART status and place it into the computer. If this disk gives you similar behavior then you know that it is hardware other than the disk that is giving you problems. In that case I would think that it is the controller. You did mention that you changed systems and that it was still giving you problems so, after all is said and done, I would still think that there had to be one common component that was causing the system instability. You can also look at:

  1. bad cable (was the cable swapped to the second machine with the drive?)
  2. bad configuration on the systems (are you setting up the system the same with different hw?)