How to succinctly, portably, and thoroughly seed the mt19937 PRNG?
I seem to see many answers in which someone suggests using <random>
to generate random numbers, usually along with code like this:
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 5);
dis(gen);
Usually this replaces some kind of "unholy abomination" such as:
srand(time(NULL));
rand()%6;
We might criticize the old way by arguing that time(NULL)
provides low entropy, time(NULL)
is predictable, and the end result is non-uniform.
But all of that is true of the new way: it just has a shinier veneer.
rd()
returns a singleunsigned int
. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.Using
std::mt19937 gen(rd());gen()
(seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)std::random_device
can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse thantime(NULL)
.
Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.
In light of this, my question is How can one succinctly, portably, and thoroughly seed the mt19937 PRNG in C++?
Given the issues above, a good answer:
- Must fully seed the mt19937/mt19937_64.
- Cannot rely solely on
std::random_device
ortime(NULL)
as a source of entropy. - Should not rely on Boost or other libaries.
- Should fit in a small number of lines such that it would look nice copy-pasted into an answer.
Thoughts
My current thought is that outputs from
std::random_device
can be mashed up (perhaps via XOR) withtime(NULL)
, values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.std::random_device::entropy()
does not give a good indication of whatstd::random_device
might or might not do.
Solution 1:
I would argue the greatest flaw with std::random_device
is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG using std::random_device
, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.
That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as sysrandom
below) to seed the PRNG.
Windows
You can rely on CryptGenRandom
, a CSPRNG. For example, you may use the following code:
bool acquire_context(HCRYPTPROV *ctx)
{
if (!CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, 0)) {
return CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, CRYPT_NEWKEYSET);
}
return true;
}
size_t sysrandom(void* dst, size_t dstlen)
{
HCRYPTPROV ctx;
if (!acquire_context(&ctx)) {
throw std::runtime_error("Unable to initialize Win32 crypt library.");
}
BYTE* buffer = reinterpret_cast<BYTE*>(dst);
if(!CryptGenRandom(ctx, dstlen, buffer)) {
throw std::runtime_error("Unable to generate random bytes.");
}
if (!CryptReleaseContext(ctx, 0)) {
throw std::runtime_error("Unable to release Win32 crypt library.");
}
return dstlen;
}
Unix-Like
On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).
size_t sysrandom(void* dst, size_t dstlen)
{
char* buffer = reinterpret_cast<char*>(dst);
std::ifstream stream("/dev/urandom", std::ios_base::binary | std::ios_base::in);
stream.read(buffer, dstlen);
return dstlen;
}
Other
If no CSPRNG is available, you might choose to rely on std::random_device
. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).
Seeding
Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).
std::uint_least32_t seed;
sysrandom(&seed, sizeof(seed));
std::mt19937 gen(seed);
Comparison To Boost
We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses MS_DEF_PROV
on Windows, which is the provider type for PROV_RSA_FULL
. The only thing missing would be verifying the cryptographic context, which can be done with CRYPT_VERIFYCONTEXT
. On *Nix, Boost uses /dev/urandom
. IE, this solution is portable, well-tested, and easy-to-use.
Linux Specialization
If you're willing to sacrifice succinctness for security, getrandom
is an excellent choice on Linux 3.17 and above, and on recent Solaris. getrandom
behaves identically to /dev/urandom
, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linux getrandom
is available, and if not falls back to /dev/urandom
.
#if defined(__linux__) || defined(linux) || defined(__linux)
# // Check the kernel version. `getrandom` is only Linux 3.17 and above.
# include <linux/version.h>
# if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
# define HAVE_GETRANDOM
# endif
#endif
// also requires glibc 2.25 for the libc wrapper
#if defined(HAVE_GETRANDOM)
# include <sys/syscall.h>
# include <linux/random.h>
size_t sysrandom(void* dst, size_t dstlen)
{
int bytes = syscall(SYS_getrandom, dst, dstlen, 0);
if (bytes != dstlen) {
throw std::runtime_error("Unable to read N bytes from CSPRNG.");
}
return dstlen;
}
#elif defined(_WIN32)
// Windows sysrandom here.
#else
// POSIX sysrandom here.
#endif
OpenBSD
There is one final caveat: modern OpenBSD does not have /dev/urandom
. You should use getentropy instead.
#if defined(__OpenBSD__)
# define HAVE_GETENTROPY
#endif
#if defined(HAVE_GETENTROPY)
# include <unistd.h>
size_t sysrandom(void* dst, size_t dstlen)
{
int bytes = getentropy(dst, dstlen);
if (bytes != dstlen) {
throw std::runtime_error("Unable to read N bytes from CSPRNG.");
}
return dstlen;
}
#endif
Other Thoughts
If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both basic_filebuf
and FILE
contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).
This could easily be done by changing sysrandom
to:
size_t sysrandom(void* dst, size_t dstlen)
{
int fd = open("/dev/urandom", O_RDONLY);
if (fd == -1) {
throw std::runtime_error("Unable to open /dev/urandom.");
}
if (read(fd, dst, dstlen) != dstlen) {
close(fd);
throw std::runtime_error("Unable to read N bytes from CSPRNG.");
}
close(fd);
return dstlen;
}
Thanks
Special thanks to Ben Voigt for pointing out FILE
uses buffered reads, and therefore should not be used.
I would also like to thank Peter Cordes for mentioning getrandom
, and OpenBSD's lack of /dev/urandom
.
Solution 2:
In a sense, this can't be done portably. That is, one can conceive a valid fully-deterministic platform running C++ (say, a simulator which steps the machine clock deterministically, and with "determinized" I/O) in which there is no source of randomness to seed a PRNG.