Why does RAM have to be volatile?
When most people read or hear "RAM", they think of these things:
Actually these are made of DRAM chips, and it's controversial if DRAM is a kind of RAM. (It used to be "real" RAM, but technology had changed and it's more of a religious belief if it's RAM or not, see discussion in the comments.)
RAM is a broad term. It stands for "random access memory", that is any kind of memory that can be accessed in any order (where by "accessed" I mean read or written, but some kinds of RAM may be read-only).
For example HDD isn't a random access memory, because when you try to read two bits that aren't adjacent (or you're reading them in reverse order for whatever reason) you have to wait for the platters to rotate and the header to move. Only sequential bits can be read without additional operations in between. That's also why DRAM can be considered non-RAM - it's read in blocks.
There are many kinds of random access memory. Some of them aren't volatile and there are even read-only ones too, for example ROM. So non-volatile RAM exists.
Why don't we use it? Speed isn't the biggest problem as for example NOR Flash memory can be read as fast as DRAM (at least that's what Wikipedia says, but without citation). Write speeds are worse, but the most important issue is:
Because of the inner architecture of non-volatile memory, they have to wear out. The number of write-and-erase cycles is limited to 100,000-1,000,000. It looks like a great number and it's usually sufficient for non-volatile storage (pendrives don't break that often, right?), but it's an issue that already had to be addressed in SSD drives. RAM is written way more often than SSD drives, so it would be more prone to wearing.
DRAM doesn't wear out, it's fast and relatively cheap. SRAM is even faster, but it's also more expensive. Right now it is used in CPUs for caching. (and it's truly RAM without any doubt ;) )
Deep down it's due to physics.
Any non-volatile memory must store its bits in two states which have a large energy barrier between them, or else the smallest influence would change the bit. But when writing to that memory, we must actively overcome that energy barrier.
Designer have quite some freedom in setting those energy barriers. Set it low 0 . 1
, and you get memory which can be rewritten a lot without generating a lot of heat: fast and volatile. Set the energy barrier high 0 | 1
and the bits will stay put almost forever, or until you expend serious energy.
DRAM uses small capacitors which leak. Bigger capacitors would leak less, be less volatile, but take longer to charge.
Flash uses electrons which are shot at high voltage into an isolator. The energy barrier is so high that you can't get them out in a controlled way; the only way is to clean out an entire block of bits.
It should be noted that the first commonly-used "main store" in computers was "core" -- tiny toroids of ferrite material arranged in an array, with wire running through them in 3 directions.
To write a 1 you'd send equal strength pulses through the corresponding X and Y wires, to "flip" the core. (To write a zero you wouldn't.) You'd have to erase the location before writing.
To read you'd try to write a 1 and see if a corresponding pulse was generated on the "sense" wire -- if so the location used to be a zero. Then you'd of course have to write the data back, since you'd just erased it.
(This is a slightly simplified description, of course.)
But the stuff was non-volatile. You could shut down the computer, start it up a week later, and the data would still be there. And it was most definitely "RAM".
(Before "core" most computers operated directly off a magnetic "drum", with only a few registers of CPU memory, and a few used stuff like storage CRTs.)
So, the answer as to why RAM (in it's current, most common form) is volatile is simply that that form is cheap and fast. (Intel, interestingly enough, was the early leader in developing semiconductor RAM, and only got into the CPU business to generate a market for their RAM.)