Consistently create same random numpy array

I am waiting for another developer to finish a piece of code that will return an np array of shape (100,2000) with values of either -1,0, or 1.

In the meantime, I want to randomly create an array of the same characteristics so I can get a head start on my development and testing. The thing is that I want this randomly created array to be the same each time, so that I'm not testing against an array that keeps changing its value each time I re-run my process.

I can create my array like this, but is there a way to create it so that it's the same each time. I can pickle the object and unpickle it, but wondering if there's another way.

r = np.random.randint(3, size=(100, 2000)) - 1

Solution 1:

Create your own instance of numpy.random.RandomState() with your chosen seed. Do not use numpy.random.seed() except to work around inflexible libraries that do not let you pass around your own RandomState instance.

[~]
|1> from numpy.random import RandomState

[~]
|2> prng = RandomState(1234567890)

[~]
|3> prng.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])

[~]
|4> prng2 = RandomState(1234567890)

[~]
|5> prng2.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])

Solution 2:

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you'll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.

Solution 3:

I just want to clarify something in regard to @Robert Kern answer just in case that is not clear. Even if you do use the RandomState you would have to initialize it every time you call a numpy random method like in Robert's example otherwise you'll get the following results.

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> prng = np.random.RandomState(2019)
>>> prng.randint(-1, 2, size=10)
array([-1,  1,  0, -1,  1,  1, -1,  0, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([-1, -1, -1,  0, -1, -1,  1,  0, -1, -1])
>>> prng.randint(-1, 2, size=10)
array([ 0, -1, -1,  0,  1,  1, -1,  1, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([ 1,  1,  0,  0,  0, -1,  1,  1,  0, -1])

Solution 4:

If you are using other functions relying on a random state, you can't just set and overall seed, but should instead create a function to generate your random list of number and set the seed as a parameter of the function. This will not disturb any other random generators in the code:

# Random states
def get_states(random_state, low, high, size):
    rs = np.random.RandomState(random_state)
    states = rs.randint(low=low, high=high, size=size)
    return states

# Call function
states = get_states(random_state=42, low=2, high=28347, size=25)

Solution 5:

It is important to understand what is the seed of a random generator and when/how it is set in your code (check e.g. here for a nice explanation of the mathematical meaning of the seed).

For that you need to set the seed by doing:

random_state = np.random.RandomState(seed=your_favorite_seed_value)

It is then important to generate the random numbers from random_state and not from np.random. I.e. you should do:

random_state.randint(...)

instead of

np.random.randint(...) 

which will create a new instance of RandomState() and basically use your computer internal clock to set the seed.