Generating non-repeating random numbers in Python

This is a neat problem, and I've been thinking about it for a while (with solutions similar to Sjoerd's), but in the end, here's what I think:

Use your point 1) and stop worrying.

Assuming real randomness, the probability that a random number has already been chosen before is the count of previously chosen numbers divided by the size of your pool, i.e. the maximal number.

If you say you only need a billion numbers, i.e. nine digits: Treat yourself to 3 more digits, so you have 12-digit serial numbers (that's three groups of four digits – nice and readable).

Even when you're close to having chosen a billion numbers previously, the probability that your new number is already taken is still only 0,1%.

Do step 1 and draw again. You can still check for an "infinite" loop, say don't try more than 1000 times or so, and then fallback to adding 1 (or something else).

You'll win the lottery before that fallback ever gets used.


You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.

Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 10, width 9 for the parameters of the question), with an algorithm which is still cryptographically robust.

It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.

AES-FFX is one proposed standard method to achieve this.

I've experimented with some basic Python code for AES-FFX--see Python code here (but note that it doesn't fully comply with the AES-FFX specification). It can e.g. encrypt a counter to a random-looking 7-digit decimal number. E.g.:

0000000   0731134
0000001   6161064
0000002   8899846
0000003   9575678
0000004   3030773
0000005   2748859
0000006   5127539
0000007   1372978
0000008   3830458
0000009   7628602
0000010   6643859
0000011   2563651
0000012   9522955
0000013   9286113
0000014   5543492
0000015   3230955
...       ...

For another example in Python, using another non-AES-FFX (I think) method, see this blog post "How to Generate an Account Number" which does FPE using a Feistel cipher. It generates numbers from 0 to 2^32-1.


With some modular arithmic and prime numbers, you can create all numbers between 0 and a big prime, out of order. If you choose your numbers carefully, the next number is hard to guess.

modulo = 87178291199 # prime
incrementor = 17180131327 # relative prime

current = 433494437 # some start value
for i in xrange(1, 100):
    print current
    current = (current + incrementor) % modulo

If they don't have to be random, but just not obviously linear (1, 2, 3, 4, ...), then here's a simple algorithm:

Pick two prime numbers. One of them will be the largest number you can generate, so it should be around one billion. The other should be fairly large.

max_value = 795028841
step = 360287471
previous_serial = 0
for i in xrange(0, max_value):
    previous_serial += step
    previous_serial %= max_value
    print "Serial: %09i" % previous_serial

Just store the previous serial each time so you know where you left off. I can't prove mathmatically that this works (been too long since those particular classes), but it's demonstrably correct with smaller primes:

s = set()
with open("test.txt", "w+") as f:
    previous_serial = 0
    for i in xrange(0, 2711):
        previous_serial += 1811
        previous_serial %= 2711
        assert previous_serial not in s
        s.add(previous_serial)

You could also prove it empirically with 9-digit primes, it'd just take a bit more work (or a lot more memory).

This does mean that given a few serial numbers, it'd be possible to figure out what your values are--but with only nine digits, it's not likely that you're going for unguessable numbers anyway.


If you don't need something cryptographically secure, but just "sufficiently obfuscated"...

Galois Fields

You could try operations in Galois Fields, e.g. GF(2)32, to map a simple incrementing counter x to a seemingly random serial number y:

x = counter_value
y = some_galois_function(x)
  • Multiply by a constant
    • Inverse is to multiply by the reciprocal of the constant
  • Raise to a power: xn
  • Reciprocal x-1
    • Special case of raising to power n
    • It is its own inverse
  • Exponentiation of a primitive element: ax
    • Note that this doesn't have an easily-calculated inverse (discrete logarithm)
    • Ensure a is a primitive element, aka generator

Many of these operations have an inverse, which means, given your serial number, you can calculate the original counter value from which it was derived.

As for finding a library for Galois Field for Python... good question. If you don't need speed (which you wouldn't for this) then you could make your own. I haven't tried these:

  • NZMATH
  • Finite field Python package
  • Sage, although it's a whole environment for mathematical computing, much more than just a Python library

Matrix multiplication in GF(2)

Pick a suitable 32×32 invertible matrix in GF(2), and multiply a 32-bit input counter by it. This is conceptually related to LFSR, as described in S.Lott's answer.

CRC

A related possibility is to use a CRC calculation. Based on the remainder of long-division with an irreducible polynomial in GF(2). Python code is readily available for CRCs (crcmod, pycrc), although you might want to pick a different irreducible polynomial than is normally used, for your purposes. I'm a little fuzzy on the theory, but I think a 32-bit CRC should generate a unique value for every possible combination of 4-byte inputs. Check this. It's quite easy to experimentally check this, by feeding the output back into the input, and checking that it produces a complete cycle of length 232-1 (zero just maps to zero). You may need to get rid of any initial/final XORs in the CRC algorithm for this check to work.