Generate random numbers with a given (numerical) distribution
Solution 1:
scipy.stats.rv_discrete
might be what you want. You can supply your probabilities via the values
parameter. You can then use the rvs()
method of the distribution object to generate random numbers.
As pointed out by Eugene Pakhomov in the comments, you can also pass a p
keyword parameter to numpy.random.choice()
, e.g.
numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])
If you are using Python 3.6 or above, you can use random.choices()
from the standard library – see the answer by Mark Dickinson.
Solution 2:
Since Python 3.6, there's a solution for this in Python's standard library, namely random.choices
.
Example usage: let's set up a population and weights matching those in the OP's question:
>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
Now choices(population, weights)
generates a single sample:
>>> choices(population, weights)
4
The optional keyword-only argument k
allows one to request more than one sample at once. This is valuable because there's some preparatory work that random.choices
has to do every time it's called, prior to generating any samples; by generating many samples at once, we only have to do that preparatory work once. Here we generate a million samples, and use collections.Counter
to check that the distribution we get roughly matches the weights we gave.
>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})