RAID-6: better to replace two dead drives at the same time, or one at a time?

We have a 16-drive RAID-6 that has three problem drives. Two are already dead, and the third is giving SMART warnings. (Nevermind how it got in such a bad state.)

Obviously we want to replace the dead drives before the one that is still working, but is it better to:

  1. replace one dead drive, let the RAID rebuild, then replace the other, and let it rebuild again; or

  2. replace both drives at once and let it rebuild both in parallel?

To put it another way, will we get back to a state of redundancy faster by reintroducing one drive or two? Does rebuilding two drives in parallel slow the rebuild process?

In case it matters, the controller is a 3ware 9650SE-16ML.


Solution 1:

!!!!! ONE !!!!!

Do one at a time, seriously dude, don't think of doing this ANY other way ok.

Anything else will test your full system restoration skills.

Solution 2:

Do you have good, recent backups? If not do you think you can get them in reasonable time?

I'd honestly be more concerned about tripping the bad drive offline during a rebuild than anything else - If you're already throwing SMART errors you're more than halfway there.

My suggestion would be to confirm your backups, then rebuild one drive at a time to try to recover to a state where you can replace the one throwing SMART errors (dead drives first, soft-errors last).

If you have no backups it's a crap shoot: Backing up may create enough soft errors to mark the marginal drive as failed, as may trying to do a rebuild.