Why does rand() repeat numbers far more often on Linux than Mac?

Solution 1:

While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.

Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:

/*
 * Compute x = (7^5 * x) mod (2^31 - 1)
 * without overflowing 31 bits:
 *      (2^31 - 1) = 127773 * (7^5) + 2836
 * From "Random number generators: good ones are hard to find",
 * Park and Miller, Communications of the ACM, vol. 31, no. 10,
 * October 1988, p. 1195.
 */
    long hi, lo, x;

    /* Can't be initialized with 0, so use another value. */
    if (*ctx == 0)
        *ctx = 123459876;
    hi = *ctx / 127773;
    lo = *ctx % 127773;
    x = 16807 * lo - 2836 * hi;
    if (x < 0)
        x += 0x7fffffff;
    return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));

This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.

Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)

On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:

uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;

This implementation results in almost exactly 790 million duplicates in your test.

Solution 2:

MacOS provides an undocumented rand() function in stdlib. If you leave it unseeded, then the first values it outputs are 16807, 282475249, 1622650073, 984943658 and 1144108930. A quick search will show that this sequence corresponds to a very basic LCG random number generator that iterates the following formula:

xn+1 = 75 · xn (mod 231 − 1)

Since the state of this RNG is described entirely by the value of a single 32-bit integer, its period is not very long. To be precise, it repeats itself every 231 − 2 iterations, outputting every value from 1 to 231 − 2.

I don't think there's a standard implementation of rand() for all versions of Linux, but there is a glibc rand() function that is often used. Instead of a single 32-bit state variable, this uses a pool of over 1000 bits, which to all intents and purposes will never produce a fully repeating sequence. Again, you can probably find out what version you have by printing the first few outputs from this RNG without seeding it first. (The glibc rand() function produces the numbers 1804289383, 846930886, 1681692777, 1714636915 and 1957747793.)

So the reason you're getting more collisions in Linux (and hardly any in MacOS) is that the Linux version of rand() is basically more random.