What's a worthwhile test for a new HD?
I work for a company that uses standard 2.5" SATA HD's in our product. We presently test them by running the Linux 'badblocks -w' command on them when we get them - but they are 160 gig drives, so that takes like 5 hours (we boot parted magic onto a PC to do the scan). We don't actually build that many systems at a time, so this doable, but seriously annoying.
Is there any research or anecdotal evidence on what a good incoming test for a hard drive should be? I'm thinking that we should just wipe them with all zeros, write out our image, and do a full drive read back. That would end up being only about 1 hour 45 minutes total.
Given that drives do block remapping on their own, would what I've proposed show up any infant mortality just as well as running badblocks?
I think badblocks may still do what you want, but you are just not passing enough options.
By default it with the -w
option it will run four passes of your hard drive writing these patterns. 00000000
, 01010101
, 10101010
, and 11111111
. You probably should pass the -t
option and just run a single pass with one of those patterns. Running all four passes is probably more then you need.
I'd recommend filling the drive with 0s then 1s. Check the SMART values before, during, and after. Anything over that is overkill, if this isn't itself.
You still don't talk numbers. For example:
- what is the percent of drives that fail this test;
- what do you do with those;
- do you take detailed statistics by vendor/series.
And the best one:
- what result do you want to accomplish.
Without numbers I can only say that you are wasting 1 hour 45 minutes per drive because of:
- synthetic load;
- generic environment;
- any "stress test" has a chance damaging drive.
You would be surprised that if you just check SMART one month after deployment you will get much more useful statistics because of:
- each hard drive would work with real load;
- each hard drive would work in real environment.