Not quite grasping /dev/random and /dev/urandom
I am not quite understanding the difference between /dev/random
and /dev/urandom
on Linux systems.
What does it mean to run out of “entropy” and how does the system get more?
What does it mean when people say /dev/random
“blocks” if there isn’t enough entropy?
Which one should I use for what scenarios?
Randomness means the next value you get has no dependency on the previous value and there is no way for you to predict it.
This is actually hard for a computer to do, since a computer is pretty much just a really fast calculator - so it can do math, but will always get the exact answer every time. You can do something close to randomness with math called "pseudorandomness" - but it's not high quality enough to be used for cryptography.
So Linux collects "randomness" in pools from various sources (such as timing between input events). The "amount" of randomness in this pool is the entropy. Less entropy = less regular, repeating, predictable patterns - you want as much entropy as possible. The Linux kernel will "fill" its pool with entropy when it gets low, but it depends on what's happening on the system since it uses timing between unpredictable hardware events to generate it.
If the pool is empty, /dev/random
will block, or stop giving out data, until the kernel gets enough entropy.
/dev/urandom
will keep going - using pseudorandom techniques to generate random numbers.
Now that you got the basics down, you can always use urandom and here is why.
Here's an excerpt from that article explaining why it doesn't matter:
But let's assume you've obtained those “true” random numbers. What are you going to do with them?
You print them out, frame them and hang them on your living-room wall, to revel in the beauty of a quantum universe? That's great, and I certainly understand.
Wait, what? You're using them? For cryptographic purposes? Well, that spoils everything, because now things get a bit ugly.
You see, your truly-random, quantum effect blessed random numbers are put into some less respectable, real-world tarnished algorithms.
Because almost all of the cryptographic algorithms we use do not hold up to information-theoretic security. They can “only” offer computational security. The two exceptions that come to my mind are Shamir's Secret Sharing and the One-time pad. And while the first one may be a valid counterpoint (if you actually intend to use it), the latter is utterly impractical.
But all those algorithms you know about, aes, rsa, Diffie-Hellman, Elliptic curves, and all those crypto packages you're using, OpenSSL, GnuTLS, Keyczar, your operating system's crypto API, these are only computationally secure.
What's the difference? While information-theoretically secure algorithms are secure, period, those other algorithms cannot guarantee security against an adversary with unlimited computational power who's trying all possibilities for keys. We still use them because it would take all the computers in the world taken together longer than the universe has existed, so far. That's the level of “insecurity” we're talking about here.
Unless some clever guy breaks the algorithm itself, using much less computational power. Even computational power achievable today. That's the big prize every cryptanalyst dreams about: breaking aes itself, breaking rsa itself and so on.
So now we're at the point where you don't trust the inner building blocks of the random number generator, insisting on “true randomness” instead of “pseudo randomness”. But then you're using those “true” random numbers in algorithms that you so despise that you didn't want them near your random number generator in the first place!
Truth is, when state-of-the-art hash algorithms are broken, or when state-of-the-art block ciphers are broken, it doesn't matter that you get “philosophically insecure” random numbers because of them. You've got nothing left to securely use them for anyway.
So just use those computationally-secure random numbers for your computationally-secure algorithms. In other words: use /dev/urandom.