Relative failure rates for hardware components

Solution 1:

About hard disks, many people misunderstand the MTBF and think a drive with a MTBF 100,000 hours will last, on average, for 11.5 years. What the manufacturer means is that in a collection of a large number of drives, N, all within their lifetime, that one drive will file for every 100,000/N hours. If you have 100,000 drives that each have a MTBF of 100,000 hours, then you should expect a drive to fail -- on average -- every hour.

Hard drives fail more often than people expect. Back up, back up, back up.

Anything with moving parts can fail, including tape drives, floppy drives, fans, and so on. I've had the fan on graphics cards die, causing the death of the graphics card. I've had the power supply fan die, causing most of the parts of the computer to die. (Since then I've never built a system without extra fans.) Tape drives require extra care, or their lifetimes will be significantly shortened. This is because not only does it move, but the tape head makes physical contact with the tape media -- at least in many kinds of tape drives. Cleaning the drive too often with ordinary tape cleaning media will wear away the tape heads.

I've had the built-in chipset fans die, but so far without any effect. So far I've never had a CPU fan die, but I tend to upgrade often enough that I probably avoid this via upgrades. (grin)

I replace my disk drives every several years (mostly because the capacity available increases so rapidly), so have experienced relatively few hard drive failures. I've had many power supplies fail -- many more than I would have naively expected for a component with no moving parts other than the fan. I assume that power irregularities are the cause of many power supply failures.

So far, in a few decades of computing, I have never had a CPU or RAM or motherboard fail unless there was a reasonable cause, such as overheating (fans dying). However, a few brands of motherboards over the years have had much shorter lifetimes than expected due to sub-par parts, often incorrectly manufactured capacitors where power enters the motherboard.

Anywhere that you have a plugged-in connection is a point of failure. I've had computers fail (mostly long ago) due to cheap tin-plated connectors. The tin oxidized and over time the connection because less and less reliable. Eventually I unplugged everything, took an eraser to the tin connectors to remove the oxidation, plugged everything back in, and was up and going for a while longer. Gold connectors are the connector of choice for a reason.

From what I've seen in a corporate environment, with my home experienced mixed in, components seem to fail in this order, from most to least frequently.

  1. Hard drives and tape drives
  2. Power supplies
  3. fans
  4. distantly, everything else

Not mentioned above, but you should expect all flash memory sticks/cards to eventually die, depending on frequency of use. But it will take a long time given the average use of most such cards. Flash memory "wears out" with use and memory cells will eventually fail.

Solution 2:

Anecdotally, batteries.

I have no hard data, but I have replaced more failed or under-performing batteries in my life than any other component. This includes uninterruptible power supplies, laptops/notebooks, controller batteries, mobile phone batteries, and probably a lot of others.

This has led me to always stock an extra battery pack for a server room's UPS.