Is there a reason to change a server's hard drive before it faults?

Solution 1:

A great reason to change it is if you want to add another task to your list of things to do while increasing the chances of something going wrong.

All joking aside, there really isn't any reason I've heard of to change the drive ahead of time. If you have RAID in place, you already have protection in place (assuming you have decent backups), and you're not generating waste material in the form of a dead drive to dispose of and you don't have to needlessly work on eliminating sensitive data from the drive. You won't be spending extra money on new drives and you still won't be proactively protecting against things that could still go wrong anyway, like a faulty drive controller, which isn't common as a drive fault source but can happen.

On the other hand this might help you discover unrecoverable drive errors that aren't triggering alarms on the RAID unit, as we had happen with RAID 5. We were bitten by this and ended up needing to rebuild from bare metal from backup (so even in that case, a proper backup will help you recover.) A RAID level that takes into consideration today's larger drive capacities and unrecoverable error tolerances would have helped us, if not, backups save the day.

Most administrators have a decent RAID and backup plan so there's no real need to generate extra waste by replacing the drives needlessly.

Solution 2:

The only time I might consider this is if I had a bunch of disks from the same batch, and others in the batch had started failing, then I might consider it.

If I was tight on space, then sure, I'd do it -- but for no other reason than just because it's getting old? No, because on average the failure rate in the first year is similar to the failure rate any other years. (note that the graph breaks out the first year over 3 month, 6 month, 1 year, but you'd have to add them all together to get the chance of failure at 1 year). And when looking at high disk utilization, it's more likely to fail in the first year than in the next three years combined.

The only correlation to late drive failure was in hotter rooms, and we keep our server rooms cool.

Solution 3:

I'm all for being proactive, but I've never done it and have never heard of anyone doing it. Presumably you have some type of RAID setup and have regularly occurring, valid backups for the system(s) in question.

Solution 4:

Yes, performance and capacity. If the old hard drive does 70MB/sec sustained reads and 100 IOPS and the potential replacement does 200MB/sec sustained reads and 175 IOPS and also has 3 times the capacity you might be justified to buy new drives and swap out old for new simply for performance/capacity reasons. (and those numbers are totally made up, the point is newer can be significantly faster).

Now what do you do with the old drives. You might use them in a test server, or add them to a backup to disk array, or hold on to them as emergency spares. Or you might just wipe them and send them away for disposal.

Your average server now days is IO bound more than it is processor bound (or at least all of mine are). So if you have a really old server that has no issues with CPU time or Memory shortages you likely have room to significantly improve performance by replacing hard drives that are several generations behind what you can easily purchase to replace them with.

Solution 5:

It depends of the impact if the hard drive fault.

If you don't have a RAID
If you don't care about the server availability because the service can be stoped or because it's in high availability and if you have a working backup of data. I would say Ok, let the drive die and change it and restore data when it will fail.
If you care about availability, I will say use RAID ;)

If you have a RAID (1, 5, 6, ...)
I would say, why changing the hard drive before fault ? RAID (and backup) is here for that. Changing a hard drive just in case it could fail is a risk to broke something (raid reconstruction is always risky)

But it's only my point of view ! If you think your drive may be too old, you may want to change your server too.