How do I quickly stop a process that is causing thrashing (due to excess memory allocation)?
We've all experienced it--some program is asked to do something that requires a huge amount of memory. It dutifully tries to allocate all this memory, and the system immediately begins thrashing, swapping endlessly and becoming sluggish or non-responsive.
I most recently experienced this on my Ubuntu laptop due to a Matlab script trying to allocate a ridiculously huge matrix. After ~5+ minutes of thrashing, I was able to Ctrl-F1 to a console and kill Matlab. I would much rather have some hot-key that would have given me control of the system immediately and allowed me to kill the offending process; or, perhaps, simply silently refuse to allocate such a large buffer.
What is the quickest way to regain control of a Linux system that has become nonresponsive or extremely sluggish due to excessive swapping?
Is there an effective way to prevent such swapping from occurring in the first place, for instance by limiting the amount of memory a process is allowed to try to allocate?
Solution 1:
Press Alt-SysRq-F to kill the process using the most memory:
- The SysRq key is usually mapped to the Print key.
- If you're using a graphical desktop you might need to press Ctrl-Alt-SysRq-F in case pressing Alt-SysRq triggers another action (e.g. snapshot program).
- If you're using a laptop you might need to press a function key too.
- For more information read the wikipedia article.
Solution 2:
I've made a script for this purpose - https://github.com/tobixen/thrash-protect
I've had this script running on production servers, workstations and laptops with good success. This script does not kill processes, but suspends them temporary - I've had several situations later where I'm quite sure I'd lost control due to thrashing if it wasn't for this simple script. In "worst" case the offending process will be slowed down a lot and in the end be killed by the kernel (OOM), in the "best" case the offending process will actually complete ... in any case, the server or workstation will remain relatively responsive so that it's easy to investigate the situation.
Of course, "buy more memory" or "don't use swap" are two alternative, more traditional answers on the question "how to avoid thrashing?", but in general they tend not to work out so well (installing more memory may be non-trivial, a rogue process can eat up all memory no matter how much one has installed, and one can get into thrashing-problems even without swap when there aren't enough memory for buffering/caching). I do recommend thrash-protect plus lots of swap space.