What are the common causes of CPU failure?

What are the most common causes of CPU failure?

Are there intermediate states between a perfectly functioning CPU and a dead one?


It may only require one transistor to fail before a CPU stops functioning -- and since there are millions of transistors in a modern CPU, you might ask why it doesn't happen more often.

And, depending where the transistor is located in the CPU, the effect can be different, but I don't think we can expect a graded decline in performance: a failure in the ALU may not be noticed until a particular instruction is executed, and some instructions would be executed less frequently.

So CPUS die suddenly when a transistor fails. This might be caused by defects in the computer chip which are stressed too much, so time may be a factor.

Excessive heat can cause the minute impurities in the silicon which form transistors to diffuse and change operating parameters. Heat is an unavoidable copnsequence of simply operating the transistors, so a lack of cooling may eventually cause failures.

Other reasons might include failure of interconnections within the package of the CPU chip, but manufacturers are always looking for improved packaging methods with more reliable interconnections and better heat dissipation.


Honestly, there are no common causes of CPU failure... at least relative to other parts of your computers. The CPU is generally the most reliable part of a computer. They just don't fail that often.

Instead, things you should watch for to fail are those things with moving parts: traditional hard drives, optical drives, and fans. More recently, we need to add SSDs to this list as well, even though they don't have moving parts. Capacitors also have a limited lifespan, and so power supplies and motherboards, which both use capacitors, can be suspect. Sometimes you'll have a bad stick of RAM, too, but I'm never really sure whey they go bad.

And now, at last, only after looking at most everything else in a computer, we come to the CPU. Even when a failure does occur, it's usually because the cooling fan (moving parts again) went bad first, and the CPU overheated as a result.


In my experience, heat. How/why? Too much thermal paste! Many (most?) people know they need some thermal paste, but they may not realize how little they should use.

The rule is use as much as the size of an uncooked grain of rice, believe it or not.

Although the paste is about 10x better than air at conducting heat, the copper of the heatsink is 10x better than the paste, so you want it as close to the CPU as possible. The paste is truly only to fill in VERY TINY cracks so air isn't in there.


Among the other causes stated here, there can also be a broken internal connection. Several different techniques are used to tie the internal "chip" leads to the external package leads, and all of these are subject to possible failure.

This sort of failure could possibly be the result of overheating, and the likelihood of the failure increases with "thermal cycles", even in the absence of overheating. The failure may start out intermittent (though usually resulting in a hard crash when it happens) but get more and more persistent as the system is cycled.

This sort of failure mimics the failures seen from poor package/socket connections, etc.

[Added:] And I notice that "whiskers" have not been mentioned. A big problem with ICs and very tiny printed circuits is "whiskers" of metal that grow out of the plated wiring and short between adjacent "wires". This is especially a problem when you take out all the lead (see "RoHS"), as lead is commonly added to the wire alloys to prevent whiskering. This problem grows worse with increased temperature, of course.