How to diagnose random freezes?

Solution 1:

The logs should always be your first port of call. Check syslog for anything untoward:

less /var/log/syslog

Also check the Xserver logs in case there's any indication of a graphics driver problem (although that sounds less likely given your description):

less /var/log/Xorg.0.log

In your particular case, these steps might not throw up anything interesting. In which case, I'd be interested to see what's going on on your system at the time of the problem developing. To that end, personally, I'd set up a temporary log of top output at short intervals - say every 5 or 10 seconds. This should hopefully reveal if a process is running wild with resources at the time of the issue.

Note that alternatives exist, such as switching to another tty with Ctrl+Alt+F1..F6 (to get back to the GUI, it's Ctrl+Alt+F7) and running commands interactively, or configuring a SSH server and logging in remotely. Both of these might be awkward if your machine is moreorless nonresponsive, hence my more awkward suggestion to write a logfile (which could also encounter the same problem, but is more likely to succeed).

It would involve something like this:

while [ 1 -eq 1 ] ; do top -b >> ~/top.log; sleep 10; done

This would write top output to a logfile at ~/top.log every 10 seconds or so. Note that this log would grow quite large if this command is left running for a prolonged period, so keep an eye on it if your machine suddenly starts behaving itself! And remove the log with rm ~/top.log when you're done with it. Note also that executing the above command is a one-time thing; it won't restart itself after a reboot.

To read the logs generated after a crash, you'd use

less ~/top.log

and hit the End key to get to the bottom. You'd be looking for processes with an unusually high %CPU value, or an unusually high RES value.

It may or may not help, but it's handy information to have.