Why not just use random_device?

Solution 1:

Also, why not just:

std::mt19937 e{std::random_device{}()};

It might be fine if you only will do this once, but if you will do it many times, it's better to keep track of your std::random_device and not create / destroy it unnecessarily.

It may be helpful to look at the libc++ source code for implementation of std::random_device, which is quite simple. It's just a thin wrapper over std::fopen("/dev/urandom"). So each time you create a std::random_device you are getting another filesystem handle, and pay all associated costs.

On windows, as I understand, std::random_device represents some call to a microsoft crypto API, so you are going to be initializing and destroying some crypto library interface everytime you do this.

It depends on your application, but for general purposes I wouldn't think of this overhead as always negligible. Sometimes it is, and then this is great.

I guess this ties back into your first question:

Instead, this is what I found on most examples/sites/articles:

 std::random_device rd;
 std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
 std::uniform_int_distribution<int> dist{1, 5};

At least the way I think about it is:

  • std::mt19937 is a very simple and reliable random generator. It is self-contained and will live entirely in your process, not calling out to the OS or anything else. The implementation is mandated by the standard, and at least in boost, it used the same code everywhere, derived from the original mt19937 paper. This code is very stable and it's cross-platform. You can be pretty confident that initializing it, querying from it, etc. is going to compile to similar code on any platform that you compile it on, and that you will get similar performance.

  • std::random_device by contrast is pretty opaque. You don't really know exactly what it is, what it's going to do, or how efficient it will be. You don't even know if it can actually be acquired -- it might throw an exception when you attempt to create it. You know that it doesn't require a seed. You're not usually supposed to pull tons and tons of data from it, just use it to generate seeds. Sometimes, it acts as a nice interface to cryptographic APIs, but it's not actually required to do that and sadly sometimes it doesn't. It might correspond to /dev/random on unix, it might correspond to /dev/urandom/. It might correspond to some MSVC crypto API (visual studio), or it might just be a fixed constant (mingw). If you cross-compile for some phone, who knows what it will do. (And even when you do get /dev/random, you still have the problem that performance may not be consistent -- it may appear to work great, until the entropy pool runs out, and then it runs slow as a dog.)

The way I think about it is, std::random_device is supposed to be like an improved version of seeding with time(NULL) -- that's a low bar, because time(NULL) is a pretty crappy seed all things considered. I usually use it where I would have used time(NULL) to generate a seed, back in the day. I don't really consider it all that useful outside of that.

Solution 2:

This article is a good point to start.

I'm going to synthesize just few points:

  • It Has Unknown Cost.

    How costly it is to read a number from this “device”? That is unspecified. It could, for example, be reading from /dev/random on a Linux system, which can block for an extended period waiting for entropy (which is itself problematic for a variety of reasons).

For my personal experience I've notified that std::random_device is usually slower than a simple Pseudo-randomic algorithm. That could be no true in general, but usually it does. That because it may involve physical devices, or other hardware than the simple CPU.

  • It Might Actually Be Deterministic .

    C++11's std::random_device is not required to be nondeterministic! Implementations can and do implement it as a simple RNG with a fixed seed, so it produces the same output for every run of the program.