Random weighted choice

I have data like this:

d = (
  (701, 1, 0.2),
  (701, 2, 0.3),
  (701, 3, 0.5),
  (702, 1, 0.2),
  (702, 2, 0.3),
  (703, 3, 0.5)
)

Where (701, 1, 0.2) = (id1, id2, priority)

Is there a pretty way to choose id2 if I know id1, using priority?

Func(701) should return:
  1 - in 20% cases
  2 - 30%
  3 - 50%

Percent will be rough of course


Generate a Cumulative Distribution Function for each ID1 thus:

cdfs = defaultdict()
for id1,id2,val in d:
    prevtotal = cdfs[id1][-1][0]
    newtotal = prevtotal + val
    cdfs[id1].append( (newtotal,id2) )

So you will have

cdfs = { 701 : [ (0.2,1), (0.5,2), (1.0,3) ], 
         702 : [ (0.2,1), (0.5,2) ],
         703 : [ (0.5,3) ] }

Then generate a random number and search for it in the list.

def func(id1):
    max = cdfs[id1][-1][0]
    rand = random.random()*max
    for upper,id2 in cdfs[id1]:
        if upper>rand:
            return id2
    return None

Realizing that my first answer was quite buggy in its math, I have produced a new idea. I believe the algorithm here is similar to that of several of the other answers, but this implementation seems to qualify for the "pretty" (if that equals simple) requirement of the question:

def func(id):
    rnd = random()
    sum = 0
    for row in d:
        if row[0] == id:
            sum = sum + row[2]
            if rnd < sum:
                return row[1]

With the example data from the OP it goes like this:

  • Pick a random number between 0 and 1.0
  • If the number is < 0.2 return the first element
  • Else if the number is < 0.5 return the second element
  • Else (if the number is < 1.0) return the third element

Use a discrete uniform distribution from the random module over a sufficient number of values, then partition it:

For example, for case 701 use a distribution over 10 values, for 2 values return 1, for another 3, return 2, and for the other 5, return 3.

You can build any distribution using enough uniform distributions :)


If your percent values will not be more precise than whole percent values, use a random number generator to generate a number 0-99.

Then in your function, use (programmatic) cases to choose the correct number. For example (clean this up):

if 701
  if random_num < 20
    return 1
  else if random number < 50   // ( 20 + 30 )
    return 2
  else if random number < 100  // ( 20 + 30 + 50 )
    return 3
  else
    // error