LInux: How do I diagnose / isolate what's causing "random" hangs and spontaneous reboots?

Solution 1:

Linux and other Unix like systems are more sensitive to flaky RAM than windows. I would run memtest86 and check the RAM

Solution 2:

Such problems can indeed be caused by faulty hardware (if you suspect the nvidia driver, maybe the graphics card has a hardware error?)

  • if you have temperature monitoring enabled (with sensors-applet / lm_sensors), are there any high readings?
  • did you do any overclocking?
  • did you have weird crashes/hangs/reboots under Windows as well?

If the system hangs, some things to check for:

  • are the keyboard LEDs blinking? AFAIK that would indicate a Kernel Panic (ie. Kernel crashed)
  • can you reach the system with Ping?
  • use the SysRq key combo (must be enabled beforehand) to see if you can get some response from system
    • see http://en.wikipedia.org/wiki/Magic_SysRq_key for details
    • you should check that the key is really enabled and working by pressing Alt+SysRq+h on the virtual terminal (switch there with Ctrl+Alt+F1; switch back with Ctrl+Alt+F7)
  • after reboot, check log files (/var/log/syslog, /var/log/Xorg.0.log) for last messages