Is there a certain or measurable advantage to using ECC RAM in a desktop PC?
I fuss a lot over building stable machines -- in that I absolutely hate crashes, reboots, funny behaviour, etc. -- and so error-correcting a.k.a. ECC RAM would seem to solve a big problem: memory errors.
But does it really work? Is there a measurable advantage, e.g. less crashes or other behaviour?
Aside from the cost, why not use ECC memory for a new PC build? Why is the ECC feature predominantly available & supported for server/workstation class machines, but not in consumer-oriented motherboards?
I've used ECC ram in servers for a few years now. ECC really shines when you are using your machine heavily, as in "it's on more than 12-16 hours a day". Little whitebox servers I've built without ECC have, sooner or later, developed "issues" that required a reboot, but the ECC machines have never had these.
So my answer is: if you use your computer a lot, then most likely yes. If you use your computer 24/7, then it should be a must-have.
There are some motherboards that support ECC out there. They are usually on the "higher" end of things, but with a little research you can find them from various manufacturers. The only other consideration is to remember to enable ECC support in the BIOS.
Google has come out swinging on this issue. See http://blogs.zdnet.com/storage/?p=638 for how this really does affect modern-day systems.
I only think ECC is worth using when the server requires it. Wikipedia:
Error detection and correction in computer systems seems to go in and out of fashion. Seymour Cray famously said "parity is for farmers" when asked why he left this out of the CDC 6600. He included parity in the CDC 7600, and reputedly said "I learned that a lot of farmers buy computers."
I can't find a definitive source on the internet, other than nebulous claims of one bit error per month per gigabyte, which is patently ridiculous; servers would be crashing left and right all over the world if this was remotely true.
Some highlights from a MetaFilter thread from actual server admins:
I think ECC is cool stuff, but I've had servers both with and without it, and I've never had its presence or absence do anything, either way.
I understand the purpose of ECC RAM, but not the point. I mean, I've never noticed any issue resulting from cosmic ray bit flipping. Even on personal compute/compile servers with multiple year uptimes. Not to say that bits didn't flip, but they certainly didn't matter.
In my experience running farms of a few thousands machines here and there, you're more likely to have Ext3 silently puke all over you than to have an ECC-correctable problem.
Personally, I think ECC is a bit cargo-cultish, but it's a reasonable insurance policy on a big beefy server as long as the cost premium isn't too high.
We've considered it for critical systems. One problem becomes, how the heck do you do error detection in software to check your memory integrity, when the program used to run the memory integrity check can itself be prone to memory errors??? You basically can't and it makes failure mode analysis / failure mitigation difficult, so ECC is a mitigation mechanism.
This is one of those cases where if there are problems, you can actually blame cosmic rays ;)
I would consider ECC ram for "mission critical" applications. If a server error would cause you to lose significant amounts of money (or kill people, or whatever), spring for the ECC ram. Basically, weigh the cost of the ECC ram versus what you stand to lose in the event of an error.
But no matter what you decide, I recommend running MemTest86+ overnight (or long enough to make several passes over the entire address space). And if you can turn up the heat (literally), that'll give you an idea on how your ram will perform when the system is running hot.
I've had brand-new RAM exhibit errors in MemTest. I've also had "good" ram develop errors, over time, that MemTest detected. It's a great tool, and one of the first things I run on a new system.