A Good and SIMPLE Measure of Randomness

What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?

The function should return a single result, say 0 if the sequence is not all all random, up to, say 1 if perfectly random. It can give something in-between if the sequence is somewhat random, e.g. 0.95 might be a reasonably random sequence, whereas 0.50 might have some non-random parts and some random parts.

If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1. If I passed the sequence 1, 2, ... 100,000 to it, it should return 0.

This way I can easily take 30 sequences of numbers, identify how random each one is, and return information about their relative randomness.

Is there such an animal?

…..

Update 24-Sep-2019: Google may have just ushered in an era of quantum supremacy says:

"Google’s quantum computer was reportedly able to solve a calculation — proving the randomness of numbers produced by a random number generator — in 3 minutes and 20 seconds that would take the world’s fastest traditional supercomputer, Summit, around 10,000 years. This effectively means that the calculation cannot be performed by a traditional computer, making Google the first to demonstrate quantum supremacy."

So obviously there is an algorithm to "prove" randomness. Does anyone know what it is? Could this algorithm also provide a measure of randomness?


Your question answers itself. "If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1", except the digits of Pi are not random numbers so if your algorithm does not recognise a very specific sequence as being non-random then it's not very good.

The problem here is there are many types of non random-ness:- eg. "121,351,991,7898651,12398469018461" or "33,27,99,3000,63,231" or even "14297141600464,14344872783104,819534228736,3490442496" are definitely not random.

I think what you need to do is identify the aspects of randomness that are important to you- distribution, distribution of digits, lack of common factors, the expected number of primes, Fibonacci and other "special" numbers etc. etc.

PS. The Quick and Dirty (and very effective) test of randomness does the file end up roughly the same size after you gzip it.


It can be done this way:

CAcert Research Lab does a Random Number Generator Analysis.

Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6x8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of "No Problems", "Potentially deterministic" and "Not Random".

So a function can be written that accepts a random sequence and does the 7 tests. If any of the 7 tests are "Not Random" then the function returns a 0. If all of the 7 tests are "No Problems", then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as "Potentially Deterministic".

The only thing missing from this solution is the code for the 7 tests.


You could try to zip-compress the sequence. The better you succeed the less random the sequence is.

Thus, heuristic randomness = length of zip-code/length of original sequence


As others have pointed out, you can't directly calculate how random a sequence is but there are several statistical tests that you could use to increase your confidence that a sequence is or isn't random.

The DIEHARD suite is the de facto standard for this kind of testing but it neither returns a single value nor is it simple.

ENT - A Pseudorandom Number Sequence Test Program, is a simpler alternative that combines 5 different tests. The website explains how each of these tests works.

If you really need just a single value, you could pick one of the 5 ENT tests and use that. The Chi-Squared test would probably be the best to use, but that might not meet the definition of simple.

Bear in mind that a single test is not as good as running several different tests on the same sequence. Depending on which test you choose, it should be good enough to flag up obviously suspicious sequences as being non-random, but might not fail for sequences that superficially appear random but actually exhibit some pattern.