Prove a random generated number is uniform distributed

To prove it, you need to know the algorithm being used and show in graph terms that the set of all states constitutes a cycle, that there are no subcycles, and that the cardinality of the state space modulo N is zero so that there is no set of states that occur more/less frequently than others. This is how we know that Mersenne Twister, for instance, is uniformly distributed even though the 64 bit version has a cycle length of 219937-1 and could never be enumerated within the lifetime of the universe.

Otherwise you use statistical tests to test the hypothesis of uniformity. Statistics can't prove a result, it fails to disprove the hypothesis. The larger your sample size is, the more compelling the failure to disprove a hypothesis is, but it is never proof. (This perspective causes more communications problems with non-statisticians/non-scientists than anything else I know.) There are many tests for uniformity, including chi-square tests, Anderson-Darling, and Kolmogorov-Smirnov to name just a few.

All of the uniformity tests will pass sequences of values such as 0,1,2,...,N-1,0,1,... so uniformity is not sufficient to say you have a good generator. You should also be testing for serial correlation with tests such as spacings tests, runs-up/runs-down, runs above/below the mean, "birthday" tests, and so on.

A pretty comprehensive suite of tests for uniformity and serial correlation was created by George Marsaglia over the course of his career, and published in 1995 as what he jokingly called the "Diehard tests" (because it's a heavy duty battery of tests).


For black-box testing (you dont have access to the source code), you can't prove it is uniformly distributed (UD). You can, however, perform statistical tests to find the likelihood of it being UD. Run the generator many times (say, N*X times) and each number between 0 and N should have appeared around X times.

This completely ignores whether it's random numbers or not, it just focuses on uniformity. However, it would only prove that the generator was uniformly distributed if you were to run infinite tests. At best, you have a probability of the generator being uniform during the first N*X iterations, but it is simple and easy to implement.


There is no way to prove it, because the generator might first generate a uniform distribution and later deviate into a non-uniform one.


Since this is an interview, the real problem is not to prove uniform distribution, the real problem is to get selected for the job. I'd suggest an approach where you quickly decide whether the interviewer is looking for an interesting discussion on advanced mathematics or is testing for your practical thinking. My guess would be that there is a good chance that the interviewer would be looking for the latter. A good interview answer could be like this : "It all depends what the random number generator is needed for. If it serves a shuffle function on a music player, I would let it generate 100 numbers, check if the average roughly equals N/2, next have a brief look through the numbers and could be satisfied at that point. If the purpose would be related to encryption, it would be a different story, I would start doing research, but would probably end up not proving it myself but rely on existing, independent proof".


Just one number from the generator, or as many as you want? If just one, you can't say anything about uniformity. So long as 0 ≤ number < N, it's OK.

Assuming the interviewer meant "[the uniformity of] a large number of results", you need to look at both the resulting distribution, and for patterns in the results. The first would be to sort and bin the results and look at the resulting histogram. It should be reasonably "flat" (e.g., not a Gaussian curve) for a large number of values.

The second test is a bit more difficult, as you could be getting patterns 2, 3, or even 4 or more numbers long. One test I saw, for triplets, is to plot the results in groups of three, in spherical coordinates (first is the azimuth, second is the altitude, and the third is the radius). I don't remember the details, but IIRC you should be seeing a uniformly filled sphere, or something like that. There's probably a formal term for this test, but the bottom line is there are a number of tests to see what a RNG is doing, so that the next number out is difficult to predict from the last number out (no apparent pattern to it).