How Random is System.Guid.NewGuid()? (Take two)

Before you start marking this as a duplicate, read me out. The other question has a (most likely) incorrect accepted answer.

I do not know how .NET generates its GUIDs, probably only Microsoft does, but there's a high chance it simply calls CoCreateGuid(). That function however is documented to be calling UuidCreate(). And the algorithms for creating an UUID are pretty well documented.

Long story short, be as it may, it seems that System.Guid.NewGuid() indeed uses version 4 UUID generation algorithm, because all the GUIDs it generates matches the criteria (see for yourself, I tried a couple million GUIDs, they all matched).

In other words, these GUIDs are almost random, except for a few known bits.

This then again raises the question - how random IS this random? As every good little programmer knows, a pseudo-random number algorithm is only as random as its seed (aka entropy). So what is the seed for UuidCreate()? How ofter is the PRNG re-seeded? Is it cryptographically strong, or can I expect the same GUIDs to start pouring out if two computers accidentally call System.Guid.NewGuid() at the same time? And can the state of the PRNG be guessed if sufficiently many sequentially generated GUIDs are gathered?

Added: To clarify, I'd like to find out how random can I trust it to be and thus - where can I use it. So, let's establish a rough "randomness" scale here:

  1. Basic randomness, taking current time as the seed. Usable for shuffling cards in Solitaire but little else as collisions are too easy to come by even without trying.
  2. More advanced randomness, using not only the time but other machine-specific factors for seed. Perhaps also seeded only once on system startup. This can be used for generating IDs in a DB because duplicates are unlikely. Still, it's not good for security because the results can be predicted with sufficient effort.
  3. Cryptograhpically random, using device noise or other advanced sources of randomness for seed. Re-seeded on every invocation or at least pretty often. Can be used for session IDs, handed out to untrusted parties, etc.

I arrived at this question while thinking if it would be OK to use them as DB IDs, and whether the Guid.comb algorithm implementation together with System.Guid.NewGuid() (like NHibernate does it) would be flawed or not.


Solution 1:

The answer is: You should not need to know this. As stated in the accepted answer to a related question:

A GUID doesn't make guarantees about randomness, it makes guarantees around uniqueness.

An even stronger statement on security and randomness is made in RFC4122, which speficies the UUID format:

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

Anything else is an implementation detail (and might be subject change).

Windows specifics

Often, people claim that the behavior on Windows is documented and that it is therefore guaranteed that GUIDs are cryptographically secure.

The now archived [MS-SECO] Windows Security Overview document mentions in Appendix A:

Although only a small minority of version 4 GUIDs require cryptographic randomness, the random bits for all version 4 GUIDs built in Windows are obtained via the Windows CryptGenRandom cryptographic API or the equivalent, the same source that is used for generation of cryptographic keys.

Moreover, section 2.5.5 of the same document explicitly mentions the use of "secret GUID" values as nonce or authenticator.

BUT: This piece of product behavior documentation is not a specification you can generally base the security of your product on (in particular in the context of .NET).

In fact, the document above describes an implementation detail of a particular product. Even if the current Windows and .NET Framework 4.x implementations produce truly random version 4 UUID values on Windows, there is no guarantee that System.Guid.NewGuid will do so in the future or on other .NET platforms (e.g. Mono, Silverlight, CF, .NET Core, etc).

Just as an example, the UUID algorithm used in earlier versions of .NET Core depends on the platform and you might get a version 1 UUID (on BSD).

Solution 2:

Some people have already hinted at that but I want to repeat it since there appears to be a misconception there:

Randomness and uniqueness are orthogonal concepts.

Random data can be unique or redundant, and likewise unique data can use a random source or a deterministic source (think a global counter that is locked and incremented for every GUID ever created).

GUIDs were designed to be unique, not random. If the .NET generator appears to use random input, fine. But don’t rely on it as a source of randomness, neither for cryptographical nor for any other purposes (in particular, what distribution function do you expect to get?). On the other hand, you can be reasonably sure that GUIDs created by .NET, even in large volumes, will be unique.

Solution 3:

APIs that produce random bytes but which are not explicitly documented to produce cryptographically strong random bytes cannot be trusted to produce cryptographically strong random bytes.

If you need cryptographically strong random bytes, then you should be using an API which is explicitly documented to produce them.

public Guid CreateCryptographicallyStrongGuid() {
    var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
    var data = new byte[16];
    rng.GetBytes(data);
    return new Guid(data);
}

These GUIDs are simply 128 bits of cryptographic randomness. They are not structured, and they will not collide.

See this article for some of the math. Using "The General Birthday Formula", rearranging gives

n = sqrt(-2T * ln(p))

where n is the number of chosen elements, T is the total number of elements (2^128), and p is the target probability that all n chosen elements will be different. With p = .99, this gives *n = 2.61532104 * 10^18*. This means that we can generate a billion truly random GUIDs per second within a system for a billion seconds (32 years), and have better than 99% chance at the end that each one is unique within the system.