How to get sufficient entropy for shuffling cards in Java?

I'm working on a statistics project involving cards and shuffling, and I've run across an issue with random number generation.

From a simple bit of math there are 52! possible deck permutations, which is approximately 2^226. I believe this means that I need a random number generator with a minimum of 226 bits of entropy, and possibly more (I'm not certain of this concept so any help would be great).

From a quick google search, the Math.random() generator in Java has a maximum of 48 bits of entropy, meaning that the vast majority of possible deck combinations would not be represented. So this does not seem to be the way to go in Java.

I was linked to this generator but it doesn't have a java implementation yet. Also for a bit of context here is one of my shuffling algorithms (it uses the Fisher-Yates method). If you have any suggestions for better code efficiency that would be fantastic as well.

public void shuffle(int type, int swaps){
    int[] newDeck = getNewDeck();

    if(type == 1){
        for(int i = 0; i < 52; i++){
            int nextCardIndex = (int)(Math.random()*newDeck.length);
            deck[i] = newDeck[nextCardIndex];
            newDeck = removeItem(nextCardIndex, newDeck);
        }
    }
}

public int[] getNewDeck(){
    int[] newDeck = new int[52];
    for(int i = 1; i <= 52; i++){
        newDeck[i-1] = i;
    }
    return newDeck;
}

public int[] removeItem(int index, int[] array){
    int[] newArray = new int[array.length-1];
    for(int i = 0; i < index; i++){
        newArray[i] = array[i];
    }
    for(int i = index; i < array.length-1; i++){
        newArray[i] = array[i+1];
    }

    array = newArray;
    return array;
}

Solution 1:

Short answer:

You have confused uniformity and entropy.
Java's built in code is already mathematically perfect. You can't improve on these with a different pseudo-random number generator.
Collections.shuffle(deck) is all you need. Really. It's perfect. Uniform, unpredictable.

But, don't take my word for it. I'll prove it.

Uniformity and entropy - not the same thing at all.

I'm working on a statistics project involving cards and shuffling, and I've run across an issue with random number generation. From a simple bit of math there are 52! possible deck permutations, which is approximately 2^226. I believe this means that I need a random number generator with a minimum of 226 bits of entropy,

You're confusing concepts. There's uniform distribution, which can be done perfectly well with 0 entropy, and unpredictability, which requires entropy. '226 bits of entropy' is utterly unrelated to the idea that you have about 2^226 different permutations available. The sequence of outputs of an algorithm doesn't become more or less predictable based on the # of choices that any given 'output' of the algorithm can provide.

Okay, so what's permutation uniformity?

Let's stick with die rolls, that's a bit more workable than deck shuffling. We have your standard 6 sided die, and we write an algorithm that rolls it. Our first attempt is this:

Random. Honest!

Think about random algorithms as a black box. Let's say you have no idea what the implementation is, you simply know: I can call it, a random number falls out.

If you call this once, you have no idea that it is broken. It really isn't - it gave a number, there was no way to know without knowing that code. It's perfectly random and perfectly unpredictable.

But, call it more than once and now its gets interesting. Pretty soon you start being able to realize that one number is generated rather a lot more than the other. If you run this algorithm 1000 times and you make a histogram, you notice that you got a thousand '4's and zero 1/2/3/5/6. This shows that the algorithm appears to be highly non-uniform, it does not provide each number equally often. That or you got incredibly unlucky, that's always possible (if you actually roll a die one thousand times, once in a bazillion billion billion years, it'll roll a 4 every time a thousand times in a row, by pure happenstance. Any algorithm that is incapable of theoretically doing that is less random than a truly random source!)

But here's another algorithm:

private int roll = 1;
int roll() {
  return roll = (roll + 1) % 6;
}

Invoke this one a whole bunch of times and that histogram is perfect. Each number shows up equally often. It is entirely uniform - no one number is more likely than any other. If I play a game with you where you predict what I will roll, and if you predict correctly I give you money, but you don't know how often I've called that method before, and you only get to play once, you can't do any better than random chance.

Contrast to our earlier return 4;, take, where you can predict '4' and you'll get cash.

There's clearly 0 entropy here, and yet it is perfectly uniform. Hence, to get uniformity (make it equally likely that continually invoking a 'shuffle this deck!' method will produce all possible ways to shuffle a deck, and doesn't have any particular order be particularly more or less likely if only you run it long enough) - we got there with 0 entropy. It's obviously extremely predictable, but that's a different issue.

Let's go for the interesting one:

int roll() {
  return 1 + (int) (Math.random() * 6);
}

This seems fine. But it is not, and I can prove it. It cannot possibly be perfectly uniform, mathematically.

Math.random() produces 'any number between 0 and 1', and, forgetting about computers for a while, there are an infinity of numbers in that range. But, we notice that the Math.random() method returns a double, and the java spec says these take up 64 bits. Obviously, you can't store a choice amongst an infinite space in 64-bits; you'd need infinite bits instead. Also, double can represent numbers that Math.random() cannot return (such as 2.0; there is a bit sequence that is a valid double and has value 2.0, and Math.random() will never return it, as it isn't between 0 and 1), so that's how you get to about 48 bits of 'variance'. It's not 'about 48 bits', it's exactly 48 bits (but even if it isn't, the following proof works):

48 bits is 2^48.

That means that our algorithm only has 2^48 different potential outputs! We can make a really, really, really humongous table, that explicitly lists all the 2^48 different inputs, and write out what the randomly rolled number would be. This table would have exactly 2^48 entries in it. That means Exactly 2^48 times a 1, 2, 3, 4, 5 or 6.

Here's the problem then: 2^48/6 is not an integer number. Obviously - 2^48 is divisible only by 2 (48 times in fact), and 6 is 2*3 in prime terms - obviously it won't divide cleanly. That table has 281474976710656 entries in it, which means, best case scenario, 4 of the 6 numbers show up 46912496118443 times, and 2 of them show up 46912496118442 times.

It's a tiny, tiny difference, but the above is proof that Math.random() cannot possibly be perfectly uniform. Unlike our method that just returns 1-2-3-4-5-6 in a sequence.

Obviously if you care about proper randomness, you do want that uniformity. It doesn't matter how many bits of entropy are powering Math.random() here. That's not the problem at all.

So what's the fix? Use the right API. java.util.Random is perfectly uniform if you use the right methods. Thus, our 4th and finally usable take on this method:

Random r = new java.util.Random();

public int roll() {
  return 1 + r.nextInt(6);
}

The API docs of nextInt spell it out for you: It guarantees perfect uniformity.

So what's predictability then?

Our method that returns 1-2-3-4-5-6 in a loop is perfectly predictable. Let's assume that we play our game again: You predict a number, I roll once, if it is that number you get a lot of money. But now let's add a wrinkle: Before you tell me the number you want, you can ask me to roll the die and tell you the result. You may ask this of me as many times as you please, you can write them all down and think about it, and you can ask me not to ever roll the dice except for you (So, when you ask, or when we do the money roll).

For that 1-2-3-4-5-6 algorithm, you will win every time. You just ask me to roll a few times, you realize it's 1-2-3-4-5-6-1-2-3-4 endlessly, and predict perfectly. So, that one is perfectly uniform, but also easily predicted based solely on watching the data.

This is where you need entropy. Entropy is about introducing factors that you do not know about. Those unknown factors are entropy. The more entropy the more numbers you need to see. For example, if I am allowed to flip 5 coins and I don't need to tell you the results, and I get to apply the results of these flips to my die rolls, I can say that I will add 1 to the first number I tell you if the first coin was heads. I add 1 to the second number I tell you if the second coin was heads, and so on. On the 6th roll I ran out and I need to re-use the first coin. I'm a computer and have no hands; a CPU has no coin in it, I can't flip more coins.

In this case, you need to ask me for the result a few times before you can predict it - you need to figure out the coin flips first. But, you will, once you ask me for 5 rolls. Then you're back to perfect prediction.

That's entropy.

The trick is, computers generate entropy all the time. Imagine this alternate algorithm: It's still the simplistic:

int roll = 1;
public int roll() {
  return roll = (roll + 1) % 6;
}

but this time, one thousand players are all playing this game with me, they're humans (so their speed of saying: "Roll for me please!" is not quite uniform or predictable, humans being very complex machines, many many billions of cells all interacting, too difficult to analyse) - so now actually even though my algorithm seems utterly idiotic, the simple fact that you don't get consecutive rolls, but instead rolls with an unknown and unpredictable number of people that asked before you, we're back to perfection: You can't win more than 1 in 6 games here.

And this is exactly how your OS works. It uses 'hey, somebody wants a random number', itself, as a source of entropy. In fact, it also injects anything hard to predict into the system. Imagine you also know that I roll a die for kicks (as in, I move one ahead in my sequence of 1-2-3-4-5-6-1-2-...) every second exactly. You can take that into account so this is useless, but it doesn't reduce the entropy. You can only add, assuming you write your algorithm correctly. The OS knows this and adds as much as it feels like. It'll add keyboard input and mouse movement. Even if the keyboard is actually a robot that presses space every second, on the dot, exactly - okay, well, no entropy there then. That doesn't make it worse. (It's a real 'hey, if it won't help at least it won't hurt' scenario).

Because of this:

The computer is continually receiving boatloads of entropy! The timing of network packets rolling into the network card, keyboard and mouse input, and the entropy inherent in all sorts of apps running on the system asking for random numbers from the OS (and java's random stuff such as Math.random() and j.u.Random just ask the OS). Even if the computer somehow 'ran out' of entropy, in a few milliseconds it'll have gathered up some more.
Randomness is like an ever-filling pool. If you've let the computer run for a few hours, mousing around a bit, you've been building up entropy all that time. It stacks up. You'd have to 'drain' this entropy out before you're in trouble.
PRNGs do not solve anything here, they can't. If truly your computer 'ran out' of entropy (which is extremely unlikely because it has so many sources), no PRNG can save you. In the end, computers just do not have a coin^{note 1}, and can't generate entropy out of nothingness. No algorithm you can imagine can change that. At best they can detect that we ran out of entropy and simply cease giving you random numbers until it finds some entropy^{note 2}.

Yes, you need at least 226 bits of entropy; if you have less, then your 'shuffle this deck' code is provably predictable in the sense that you will be able to eliminate at least some of the possible deck orders. But your system has many many many millions of bits of entropy on tap for you, you're not going to run out. But if you do, a PRNG doesn't magically fix that.

The right way to do this

Thus, we get to the point of it all: If we play our dice prediction game, my aim is to set it up as perfectly as possible for me, which is: No matter how smart you are, nothing you can imagine can do statistically better than winning one in every 6 games.

To accomplish this, I need both uniformity and unpredictability:

With that 1-2-3-4-5-6 thing, you can just predict and win that way. It's uniform, but, predictable.
With an algorithm that, say, is truly and utterly unpredictable in every way imaginable, perfect entropy, but, it simply is twice as likely to return 1 or 2. Because it's the world's most perfect die and the world's most random die roller, but it rolls an 8 sided die, and returns that % 6 (so on a 7 it says '1', and on an 8 it says '2'). In this case, you can win once every 4 games which is better than once every 6 (predict 1 every time; you win if I roll a 1 or 7 on a perfect 8 sided die, that occurs once every 4 rolls on average).

So how do you do that in java? Trivial:

Collections.shuffle(deck);

That was it. That ridiculous little one liner does it all.

It is perfectly uniform. As the docs say, shuffle uses Fisher-Yates, Fisher-Yates passes the pigeon hole principle test (the # of permutations of the random sources it gets is divisible by X! where X is the size of the list. The wiki entry on F-Y has the full proof).
It uses java.util.Random which is generally fine.

There is also SecureRandom which tries to do a few things to reduce predictability. For crypto purposes (such as generating encryption keys), you should use that. For a server that runs a poker game, even if it's for big bucks, java.util.Random is fine. Use SecureRandom if you must, but I don't think there's any way to 'break' that other than schemes which are vastly more complicated than other ways to steal your cash.

footnote 1

You can get custom hardware that can flip coins. Generally these are radios that pick up cosmic background radiation. There's also tricky ways that a computer can 'detect' cosmic background effects on its own standard hardware. I'm not sure if chips ship with it, I think some do, and your OS will initialize its random 'pool' using the entropy the chip provides. Of course, a computer that gets no human interaction or interaction from unpredictable sources such as internet traffic, that is a really well isolated room in a faraday cage and the like, there is simply no entropy here, and you can't make entropy out of nothing.

footnote 2

On linux /dev/urandom gives you endless randomness, but /dev/random will start 'blocking' (waits, like a slow file, until entropy shows up). Some misguided documentation therefore suggest you use /dev/random for crypto. But this is bad advice: It's not actually possible for a computer to know if some nugget of evident entropy is actually entropy. If the keyboard is a virtual keyboard and some robot just presses space every second, and everybody knows it, then this adds no entropy. But the computer does not know this. It could try to 'analyse' the input and realize it seems like there's a pattern to it, but sometimes if you flip a coin ten times, it's heads every time. That doesn't make that any less random. If you eliminate such patterns you're actually less random.

Hence, just use /dev/urandom. Hence, java.util.Random is generally fine.

One sidenote

Virtual PCs (e.g. docker, kubernetes, etcetera) can really really run out of entropy. Its emulated chips are not always properly 'written' and the baked in entropy initialization of the chip just returns zeroes, the network is predictable, it's all scripted and firewalled, so literally zero entropy flows into the system.

That was a real problem of early takes on the virtual PC image concepts. But these days, all the various implementations out there 'pass entropy' through from the host OS to the virtual PC, and e.g. the linux driver that manages /dev/urandom and /dev/random take it into account.

Seriously, don't worry about it. Use Collections.shuffle, use e.g. rnd.nextInt(). Don't use Math.random() for anything. Except if you really really need exactly a number between 0 and 1.

Solution 2:

Have you looked into the recent additions that are included in JDK 17?

https://docs.oracle.com/en/java/javase/17/core/pseudorandom-number-generators.html#GUID-08E418B9-036F-4D11-8E1C-5EB19B23D8A1

There are plenty of algorithms available:

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html#algorithms

For shuffling cards you likely don't need something that is cryptographically secure.

Using Collections.shuffle should do the trick if you provide a decent RNG.

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Collections.html#shuffle(java.util.List,java.util.Random)