Server suddenly running out of entropy

Solution 1:

With lsof out as a source of diagnostic utility, would setting up something using audit work? There's no way to deplete the entropy pool without opening /dev/random, so if you audit on processing opening /dev/random, the culprit (or at least the set of candidates for further examination) should drop out fairly rapidly.

Solution 2:

Normally for a public-facing server needing 'enough' entropy I would suggest something like an entropy-key, a hardware device (USB) adding random bits to the linux entropy pool. But you don't talk to the outside world.

Virtual machines can have a problem with lack of external randomness.

Your remark 'backup domain controller' does add a possible use of entropy: windows domains do use random numbers in certificates.

Solution 3:

Perhaps lsof (list open files) might help. This shows, which process currently holds what files open. In your case this only helps when you catch your process(es) draining entropy, if that process does not hold the handle open for longer.

$ lsof /dev/urandom
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
xfce4-ses  1787   to   15r   CHR    1,9      0t0 8199 /dev/urandom
applet.py  1907   to    9r   CHR    1,9      0t0 8199 /dev/urandom
scp-dbus-  5028   to   10r   CHR    1,9      0t0 8199 /dev/urandom
firefox    6603   to   23r   CHR    1,9      0t0 8199 /dev/urandom
thunderbi 12218   to   23r   CHR    1,9      0t0 8199 /dev/urandom

Just a sample from my workstation. But diving deeper into lsof might help.