The pseudoness of pseudorandom number generators

Is there a reasonable statistic test one can do to standard random number generators (say, one of those that come built in in Python libs) which shows they are not really random?

(By reasonable I mean to exclude silly things like «generate a looong list and see that it is periodic»)

Later: Notice I am looking for a test that fails on standard generators.


Solution 1:

The standard reference is Knuth, The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Anyone with serious interest in this topic should read Knuth's extensive exposition here. In addition to theoretical discussion, he presents tests including frequency, serial, gap, poker, coupon collector's, permutation, run, maximum-of-t, collision, birthday spacings, and serial correlation.

Marsaglia's DIEHARD suite of statistical test includes birthday spacings, overlapping permutations, ranks of 31x31 and 32x32 matrices, ranks of 6x8 matrices, monkey tests on 20-bit Words,monkey tests OPSO, OQSO, DNA, count the 1's in a stream of bytes, count the 1's in specific bytes, parking lot, minimum distance, random spheres, squeeze, overlapping sums, runs, and craps.

The NIST Statistical Test Suite includes frequency, block frequency, cumulative sums, runs, long runs,Marsaglia's rank, spectral (based on the Discrete Fourier Transform), nonoverlapping template matchings, overlapping template matchings, Maurer's universal statistical, approximate entropy (based on the work of Pincus, Singer and Kalman), random excursions (due to Baron and Rukhin), Lempel-Ziv complexity, linear complexity, and serial.

As for tests that fail, as Knuth mentions when discussing a typical generator in section 3.6 p. 188 "Caution: The numbers generated by ran_array fail the birthday spacings test of Section 3.3.2J, and they have other deficiencies that sometimes show up in high-resolution simulations (see exercises 3.3.2-31 and 3.3.2-35)".

Solution 2:

Searches for Marsaglia (George), works by him or about his PRNG tests, would uncover a lot of information. e.g.,

http://en.wikipedia.org/wiki/Diehard_tests

was found by looking for "marsaglia test".

Solution 3:

PRNGs in most modern software intended to do serious simulation have been vetted to make sure they pass diverse suites of tests, so you are unlikely to find a case where one of them fails.

Historically, one famous failure is the PRNG called 'RANDU'. When triads of three successive values from this generator are plotted in the unit cube, one can see that all plot points lie in a small number of widely separated parallel planes (Lewis, Orav and Uribe, 1986), whereas a satisfactory generator would place points consistent with random selection throughout the cube.

That is not to say all simulations perform flawlessly. Typically, PRNGs generate values from UNIF(0,1). It is possible to abuse values from an excellent generator when trying to transform to random samples from other distributions, but this is a fault of the programming--not of the generated values.

Example: Modern software uses the Box-Muller transformation to get two precisely standard normal random variables from two uniform ones via log and trig functions. But before doing transcendental arithmetic was as fast as it is today, a method of getting one standard normal from 12 uniform ones by addition and subtraction was $Z = U_1, \dots, U_{12} - 6$. This method, which cannot produce values $|Z| > 6$ (rare, but possible), still lingers in some texts and legacy software.

The following program in R uses this method to generate a hundred random samples of size 5000 from a "standard normal" distribution, of which 15 sanples fail the Shapiro-Wilk 5% level test of normality. (R uses the "Mersenne-Twister" PRNG due to Matsumoto and Nishimura (1998), which runs through $2^{19937} - 1$ values before repeating itself and has been vetted in tests involving up 624 dimensions without finding results inconsistent with random behavior.)

set.seed(1234); m = 100; p.val = numeric(m); n = 5000

for(i in 1:m) { z = rowSums(matrix(runif(n*12), nrow=n))-6

p.val[i] = shapiro.test(z)$p.value }

sum(p.val < .05)