A weighted version of random.choice

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

This function seems overly complex to me, and ugly. I'm hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn't as important to me as code cleanliness and readability.


Solution 1:

Since version 1.7.0, NumPy has a choice function that supports probability distributions.

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

Note that probability_distribution is a sequence in the same order of list_of_candidates. You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.

Solution 2:

Since Python 3.6 there is a method choices from the random module.

In [1]: import random

In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )

Out[2]:
[['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b']]

Note that random.choices will sample with replacement, per the docs:

Return a k sized list of elements chosen from the population with replacement.

Note for completeness of answer:

When a sampling unit is drawn from a finite population and is returned to that population, after its characteristic(s) have been recorded, before the next unit is drawn, the sampling is said to be "with replacement". It basically means each element may be chosen more than once.

If you need to sample without replacement, then as @ronan-paixão's brilliant answer states, you can use numpy.choice, whose replace argument controls such behaviour.

Solution 3:

def weighted_choice(choices):
   total = sum(w for c, w in choices)
   r = random.uniform(0, total)
   upto = 0
   for c, w in choices:
      if upto + w >= r:
         return c
      upto += w
   assert False, "Shouldn't get here"