How good is Java's UUID.randomUUID?

UUID uses java.security.SecureRandom, which is supposed to be "cryptographically strong". While the actual implementation is not specified and can vary between JVMs (meaning that any concrete statements made are valid only for one specific JVM), it does mandate that the output must pass a statistical random number generator test.

It's always possible for an implementation to contain subtle bugs that ruin all this (see OpenSSH key generation bug) but I don't think there's any concrete reason to worry about Java UUIDs's randomness.


Wikipedia has a very good answer http://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:

...

This number is equivalent to generating 1 billion UUIDs per second for about 85 years, and a file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times larger than the largest databases currently in existence, which are on the order of hundreds of petabytes.

...

Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.


Does anybody have any experience to share?

There are 2^122 possible values for a type-4 UUID. (The spec says that you lose 2 bits for the type, and a further 4 bits for a version number.)

Assuming that you were to generate 1 million random UUIDs a second, the chances of a duplicate occurring in your lifetime would be vanishingly small. And to detect the duplicate, you'd have to solve the problem of comparing 1 million new UUIDs per second against all of the UUIDs you have previously generated1!

The chances that anyone has experienced (i.e. actually noticed) a duplicate in real life are even smaller than vanishingly small ... because of the practical difficulty of looking for collisions.

Now of course, you will typically be using a pseudo-random number generator, not a source of truly random numbers. But I think we can be confident that if you are using a creditable provider for your cryptographic strength random numbers, then it will be cryptographic strength, and the probability of repeats will be the same as for an ideal (non-biased) random number generator.

However, if you were to use a JVM with a "broken" crypto- random number generator, all bets are off. (And that might include some of the workarounds for "shortage of entropy" problems on some systems. Or the possibility that someone has tinkered with your JRE, either on your system or upstream.)


1 - Assuming that you used "some kind of binary btree" as proposed by an anonymous commenter, each UUID is going to need O(NlogN) bits of RAM memory to represent N distinct UUIDs assuming low density and random distribution of the bits. Now multiply that by 1,000,000 and the number of seconds that you are going to run the experiment for. I don't think that is practical for the length of time needed to test for collisions of a high quality RNG. Not even with (hypothetical) clever representations.


I'm not an expert, but I'd assume that enough smart people looked at Java's random number generator over the years. Hence, I'd also assume that random UUIDs are good. So you should really have the theoretical collision probability (which is about 1 : 3 × 10^38 for all possible UUIDs. Does anybody know how this changes for random UUIDs only? Is it 1/(16*4) of the above?)

From my practical experience, I've never seen any collisions so far. I'll probably have grown an astonishingly long beard the day I get my first one ;)


At a former employer we had a unique column that contained a random uuid. We got a collision the first week after it was deployed. Sure, the odds are low but they aren't zero. That is why Log4j 2 contains UuidUtil.getTimeBasedUuid. It will generate a UUID that is unique for 8,925 years so long as you don't generate more than 10,000 UUIDs/millisecond on a single server.