System locks up, but accessible by SSH

Solution 1:

You mentioned that when you log in via SSH, the script is no longer running. Is it because it terminated or crashed?

You can also run your script with strace. This way you can capture every system call it makes, especially what it does when it terminates/crashes.

strace -f /path/script.py -o /tmp/output.log

It will produce a rather large file, so make sure you have enough free space.

Solution 2:

A few ideas and debugging hints:

  • When you log in via SSH, is the system idle or is a process hanging with heavy load?
  • What does "totally unresponsive" mean? Can you still get to a virtual terminal hitting Ctrl-Alt-F1? Does hitting CapsLock turn on/off the status LED on the keyboard?
  • Even though your script may not use the GPU for long periods at a time, how much maximum GPU memory does it consume?
  • Does stopping the X server and running the script from a virtual terminal reliably prevent the lockups?