Should old servers be retired

I have servers which are still running well but are over 5 years old. They are still doing the job perfectly and there will be no advantage to upgrade the servers, should I just let them run forever or should I schedule maintenance to replace the servers, or parts thereof, with new hardware? I fear that a server failure might cost data loss and more down time than scheduled maintenance. These servers are used for on-line point-of-sale, accounting, CRM and management information.

Preventive maintenance, such as replacing fans and vacuuming out dust, is not possible due to the remote location of the servers.

Also keep in mind the "bathtub curve" of failure rate with time. New hardware is more likely to fail than hardware that has been burned in for a while.

How do you tell a client who is very happy with a long-time trouble-free server that he now has to spend money to replace it because it is too old?

Finally, are there any monitoring tools for hardware problems such as voltage, temperature and fan speed that can be run remotely?


Solution 1:

Here's a previous question and answers:

Do you continue to use your end-of-life server/network equipment?

And another one:

How often does your company replace all its servers?

At 5 years, for what sounds like mission-critical functions, I'd start looking at replacement even if they're working fine. But since they are working fine, I'd plan out a slow, careful replacement. Make sure you know how to build up the OS and apps on the replacement box, know how you're going to move the data over, how you'll switch from the old to the new.

As was stated in one of the answers linked above, I'd tell the client honestly why you need to replace the hardware. Increasing cost of maintenance and support contracts, difficulty in getting replacement parts, the application vendor's preference for supporting newer hardware, are possible factors, you'd have to make the case based on what your hardware and software vendors level of support is.

Solution 2:

Probably - but with caution and attention to detail.

Things to keep in mind:

  • Can you still buy parts to repair hardware faults?
  • Is the OS and software still recent enough to be getting supported patches?
  • Can you still readily rebuild the system after a large failure?
  • Can you lower the power usage, running cost, or physical size of the server?
  • Can you improve the performance profile, or utilise the spare capacity (eg. with virtualisation of other services nearby).

Solution 3:

Explain it to the vendor in terms he or she is likely to understand. Explain that the servers are designed for a 4-5 year lifespan on average. While some will run longer than that (we've kept a server limping along for 7 years before... not proud of it, but that was in the days before virtualization), as you approach and exceed that age, the server will be more prone to breaking down.

Put it in terms of a car. After a certain point parts of the car break down or wear out, like the breaks, and need replacing. However, unlike a car, you can't just run down to the local repair place and get the server fixed. The vendor ends of life replacement parts, meaning they simply aren't available except from someone who has hoarded them and knows you now must pay a premium for them. And while you're searching for those parts and haggling over the purchase, the server will stay down.

Also, most folks look at replacing their cars as soon as their car loan is paid off. Given that it is easier to repair and maintain the car than it is those servers, especially given their remote location, point out the customer is taking a risk with their line of business that they wouldn't take in their own personal life.

Solution 4:

Personally, I'm happy to run old hardware but, only when the risks have been properly considered. As an example, I have one rather old IBM server which is way out of warranty and I can no longer obtain the parts for it. However, the software that runs on it can be transferred to another machine in a matter of minutes. Should the machine fail I can replace it temporarily with a spare PC while I decide the best long term solution. All the steps required to do this are well documented, so even if I'm unavailable the task can be completed by someone else.