Why do we need "perfectly" random numbers?
I periodically see articles about physicists or others coming up with a technique that generates a slightly more random number than was possible before, and how this is useful for encryption. But I've never see an explanation for why we need such incredibly "perfect" random numbers. I understand that rand() is pseudorandom, but if I seed it with some combination of the system time, the just-measured ping time to some server on the other side of the world, and the hash of the 217th most popular article in Google News as-of this second, isn't the result random for all practical purposes (particularly if whoever is trying to guess the number doesn't have the algorithm)?
Even if the algorithm is known, it seems like you could add enough time-varying and location-varying variables (as I tried to show in my example) to make it impossible to recreate in real life, regardless of the computing power you had.
So why do we need more and more "pure" random numbers?
There is a difference between randomizing the seed (which is what you describe above) and the resulting sequence being random. Also, these issues with good random number generators is not to avoid other people guessing the numbers (unless perhaps in certain issues with cryptography). Rather, it has to do with, for example, algorithms that employ randomness. It many situations it can greatly enhance an algorithm if the computations involves some random components. When analyzing and designing such algorithms one typically considers the random component to have a certain distribution, say Gaussian. The particular distribution of each such random component can affect both the validity of the algorithm and its efficiency. Suppose for instance that a certain algorithm will produce the correct result with probability 0.999999 given that its random component is precisely normally distributed with such and such parameters. If the random component differs from the normal distribution then the behaviour of the algorithm may significantly differ from that probability. So now, when you run the algorithm you actually need to generate the correct type of randomness. The better the random number generator the better the results will be.
Even if the algorithm is known, it seems like you could add enough time-varying and location-varying variables (as I tried to show in my example) to make it impossible to recreate in real life, regardless of the computing power you had.
Your example used
- the system time,
- the just-measured ping time to some server on the other side of the world,
- the hash of the 217th most popular article in Google News as-of this second
However, cryptography aims to find schemes that would provide you with security against an attacker who can eavesdrop on your internet connection. This attacker can know all of your three inputs to within a small enough number of choices to brute-force the result. He knows what time it is, he can see your packets you send and receive and measure ping times just as well as you can yourself, and he can read the answer you get from Google to see which article it claims to be the 217th most popular.
If you're using HTTPS to talk to Google, it is slightly harder for the attacker to see which article your pointed to, but (a) the security of HTTPS depends critically on you already having a secure source of random numbers; (b) anyway, the attacker might just query Google News on his own and try all of the top 1000 articles in the hope that some of them would have been the 217th in the response you got.
This wiki article should be helpful.
Where is says:
Truly random numbers are absolutely required to be assured of the theoretical security provided by the one-time pad — the only provably unbreakable encryption algorithm. Furthermore, those random sequences cannot be reused and must never become available to any attacker, which implies a continuously operable generator. See Venona for an example of what happens when these requirements are violated when using a one-time pad.
and:
Since a requirement in cryptography is high entropy (i.e., unpredictability to an attacker), any published random sequence is a poor choice, as are such sequences as the digits in an irrational number such as the φ or even in transcendental numbers such as π, or e. All are available to an enterprising attacker. Put another way, in cryptography, random bit streams need to be not only random, but also secret and hence unpredictable. Public or third-party sources of random values, or random values computed from publicly observable phenomena (weather, sports game results, stock prices), are almost never cryptographically acceptable, though often tempting and too often used by the unwary. They permit easier attacks than attacking the cryptography. Since most cryptographic applications require a few thousand bits at most, slow random number generators serve well—if they are actually random. This use of random generators is important; many informed observers believe every computer should have a way to generate true random numbers.
I think there is something fundamentally interesting about the challenge of creating a small system with a finite internal state that is capable of producing an output stream that
- Appears to be independent, identically distributed
- Is not periodic or repeating and can not be compressed by normal algorithms
- Has the highest possible entropy (when the state is unknown), and is unpredictable
- Can not be predicted from any length of its output
It's not exactly trivial to make such a system, but it is not all that difficult to do rather well either.
The things that is most fascinating to me is that such apparent complexity is possible from a few lines of computer code.
In a way, it is similar to requiring atom smashers that operate at higher and higher energies, or CMB detectors that are ever more sensitive : if you make finer and finer uses of the random numbers as part of complicated experiments, you want to remove as much as possible the chance that the small effects you are looking to detect are due to an artefact of your system, and in the case of using random numbers to aid simulations you want them to behave so randomly that it is as if nature were creating the simulation for you. Then test the parts of the experiment you are interested in.
Something awesomely cool is that, if it is possible to create a large subset of all possible messages using these random number generators that operate from a small finite state, would not it someday be possible to find the state that generates any possible sequence, say the sequence of bits in a video?
Compression down to the tuple of a small state, and a particular algorithm chosen from a family of algorithms could exceed the savings indicated by entropy calculations based on the symbol distribution, since it is possible to create sequences of many megabytes in length that, based on their symbol distribution have maximum entropy, yet were created with 64 bits of state and a few lines of computer code.
That's some of the reasons I think random numbers are cool and why searching for better ways of creating them, or better ways to understand the ways we create them is awesome.
Perhaps the best analogy for why this is a very interesting pursuit is that the output of these generators is really a metaphor for the universe as a whole ( or at least, what we hope the universe is like ) : a system of immense apparent complexity, but with an internal state that we can try ever harder to measure and rules we can hope to deduce.