Statistics on RAM malfunction

In a population of server class 36 machine, I see a correctable failure detected by the ECC circuitry once every 3 months.

If you suspect memory failure, you should run memtest86, which comes included with just about every popular linux distro these days.


From Robin Harris' DRAM error rates: Nightmare on DIMM street:

A two-and-a-half year study of DRAM on 10s of thousands Google servers found DIMM error rates are hundreds to thousands of times higher than thought — a mean of 3,751 correctable errors per DIMM per year.

Harris quotes a study performed over 2.5 years on Google's fleet of servers. Note that servers usually use EEC RAM, which performs some error correction. Consumer-level computers usually don't have this.

Lambda Diode's Berke Durak calculates:

First, let's assume you have a system with no error-correction nor parity. The probability that you'll experience a bit error during the time T will be 1-(1-p)^m .

For T=1 hour , p = 1.3e-12 and m = 4*2^30*8 that gives 0.044 or 4.4% . That is quite a high probability. Indeed, in one day, that leads to a probability of 66% and in 72 hours to a probability of 96% .

So the probability of having at least one bit error in 4 gigabytes of memory at sea level on planet Earth in 72 hours is over 95% .

I won't laugh the next time a colleague says "cosmic ray" when we fail to identify the cause of a crash...


You could boot the computer with memtest86+ and run a check overnight. That's how I find problems.

Yes, I have seen sticks of memory go bad where they would only fail with one particular pattern of memory writes. The BIOS of the computer did not detect the problem, but memtest86 found it on an overnight run.

I've seen two sticks of RAM go bad out of about fifty computers that I've used over the past ten years. It happens, but not often.


You might want to have a look at this google study :

On average, about one in three Google servers experienced a correctable memory error each year and one in a hundred an uncorrectable error

But they talking about ECC RAM, not your everyday user RAM