Expand a random range from 1–5 to 1–7

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.

  1. What is a simple solution?
  2. What is an effective solution to reduce memory usage or run on a slower CPU?

Solution 1:

This is equivalent to Adam Rosenfield's solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.

int rand7()
{
    int vals[5][5] = {
        { 1, 2, 3, 4, 5 },
        { 6, 7, 1, 2, 3 },
        { 4, 5, 6, 7, 1 },
        { 2, 3, 4, 5, 6 },
        { 7, 0, 0, 0, 0 }
    };

    int result = 0;
    while (result == 0)
    {
        int i = rand5();
        int j = rand5();
        result = vals[i-1][j-1];
    }
    return result;
}

How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.

Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)

Solution 2:

There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:


int i;
do
{
  i = 5 * (rand5() - 1) + rand5();  // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1;  // result is now uniformly random between 1 and 7

This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.

Solution 3:

I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5() per call to rand7(), to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.

The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5() has approximately 2.32193 bits of entropy, and rand7() has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5(), and apply them to generating 2.80735 bits of entropy needed for each call to rand7(). The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5() per call to rand7().

Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5() will be assumed to return numbers in the range [0, 4], and rand7() will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.

So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a1a2a3..., where each digit ai is chosen by a call to rand5(). For example, if our RNG chose ai = 1 for all i, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).

Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a, b] equals the length of that interval, b - a. The proof is left as an exercise for the reader =).

Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7(). How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7().

Taking the example from earlier, if our rand5() produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.

Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?

One way we can convert a number between 0 and 1 to base 7 is as follows:

  1. Multiply by 7
  2. The integral part of the result is the next base 7 digit
  3. Subtract off the integral part, leaving only the fractional part
  4. Goto step 1

To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5() twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5() produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.

So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k digits, then we can safely output the next k digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k digits of the base 7 representation!

And that's the algorithm -- to generate the next output of rand7(), we generate only as many digits of rand5() as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:

import random

rand5_calls = 0
def rand5():
    global rand5_calls
    rand5_calls += 1
    return random.randint(0, 4)

def rand7_gen():
    state = 0
    pow5 = 1
    pow7 = 7
    while True:
        if state / pow5 == (state + pow7) / pow5:
            result = state / pow5
            state = (state - result * pow5) * 7
            pow7 *= 7
            yield result
        else:
            state = 5 * state + pow7 * rand5()
            pow5 *= 5

if __name__ == '__main__':
    r7 = rand7_gen()
    N = 10000
    x = list(next(r7) for i in range(N))
    distr = [x.count(i) for i in range(7)]
    expmean = N / 7.0
    expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))

    print '%d TRIALS' % N
    print 'Expected mean: %.1f' % expmean
    print 'Expected standard deviation: %.1f' % expstddev
    print
    print 'DISTRIBUTION:'
    for i in range(7):
        print '%d: %d   (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
    print
    print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)

Note that rand7_gen() returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7) 10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.

Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).

In one run of this, I made 12091 calls to rand5() for 10000 calls to rand7(), achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.

In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5 and pow7 to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5() per call to rand7() very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.

Solution 4:

(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)

Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.

int rand7() {
  i = 5 * rand5() + rand5();
  max = 25;
  //i is uniform among {0 ... max-1}
  while(i < max%7) {
    //i is uniform among {0 ... (max%7 - 1)}
    i *= 5;
    i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
    max %= 7;
    max *= 5; //once again, i is uniform among {0 ... max-1}
  }
  return(i%7);
}

We're keeping track of the largest value that the loop can make in the variable max. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.

Edit: Expect number of times to call rand5() is x in this equation:

x =  2     * 21/25
   + 3     *  4/25 * 14/20
   + 4     *  4/25 *  6/20 * 28/30
   + 5     *  4/25 *  6/20 *  2/30 * 7/10
   + 6     *  4/25 *  6/20 *  2/30 * 3/10 * 14/15
   + (6+x) *  4/25 *  6/20 *  2/30 * 3/10 *  1/15
x = about 2.21 calls to rand5()

Solution 5:

Algorithm:

7 can be represented in a sequence of 3 bits

Use rand(5) to randomly fill each bit with 0 or 1.
For e.g: call rand(5) and

if the result is 1 or 2, fill the bit with 0
if the result is 4 or 5, fill the bit with 1
if the result is 3 , then ignore and do it again (rejection)

This way we can fill 3 bits randomly with 0/1 and thus get a number from 1-7.

EDIT: This seems like the simplest and most efficient answer, so here's some code for it:

public static int random_7() {
    int returnValue = 0;
    while (returnValue == 0) {
        for (int i = 1; i <= 3; i++) {
            returnValue = (returnValue << 1) + random_5_output_2();
        }
    }
    return returnValue;
}

private static int random_5_output_2() {
    while (true) {
        int flip = random_5();

        if (flip < 3) {
            return 0;
        }
        else if (flip > 3) {
            return 1;
        }
    }
}