Arrange server resources so that ssh is always available

I have a linux server for one of my web applications. Every once in a while, a process (a long running script maybe) might go out of hand, consume too much memory / cpu cycles maybe and block all other processes.

In such situations, I can't ssh into the server, and I need to restart the server through a management panel. I'd prefer to login into the machine and deal with the problematic process only.

Is it possible to arrange the resources in a linux machine such that, however a process is consuming resources, there is always enough resources available for an ssh connection?


Solution 1:

You can use 'nice' to prioritize certain software.

You could also look into installing monit, which you can instruct to restart a certain package if a certain threshold is met.

A monit config along these lines will automatically restart Apache:

check process apache
   with pidfile "/usr/local/apache/logs/httpd.pid"
   start program = "/etc/init.d/httpd start" with timeout 60 seconds
   stop program = "/etc/init.d/httpd stop"
   if 2 restarts within 3 cycles then timeout
   if totalmem > 100 Mb then alert
   if children > 255 for 5 cycles then stop
   if cpu usage > 95% for 3 cycles then restart
   if failed port 80 protocol http then restart
   group server
   depends on httpd.conf, httpd.bin

Solution 2:

In simple way - no. You can use nice to set SSH to highest priority, but if there is not enough memory to handle new connection, ssh will not work (don't forget that after successful login server has to start shell). You can use OOM killer to automatically kill process with too many ram consumed, but it didn't work if you will have thousands of process (like crazy apache forking) and every consumes little bit of RAM (1000 x 4MB RAM = 4GB RAM consumed without OOM limit).

Hard restart is simplest and fastest solution. If you need some services to be running 24/7, you need to use two machines in HA setup. You can use zabbix or another monitoring tool for warning and have time to solve it before whole server crashs too.