Automatically Kill/Restart Process(es) When Memory is Critically Low

I have a Debian Wheezy VPS box where am running a couple of Django apps in production. Ideally, would have tried addressed my current memory footprint issues by optimizing the apps, adding more RAM or augmenting with Swap. But the problem is that I doubt there's much memory optimization I'd milk from optimizing the Django apps (the stack being open-source and robust), and adding RAM is a cost constraint for me (this is a remote VPS), also, the host doesn't offer options to use Swap!

So, in the meantime (as I wait to secure more resources to afford more RAM), I wish to mitigate the scenarios where the server runs out memory so that I just have to request a VPS restart (as in, at that point, I can't even SSH into the box!).

So, what I would love in a solution is the ability to detect when a process (or generally, total system memory usage) exceeds a certain critical amount (for now, example the FREE RAM falls to say 10%) - which I've noticed occurs after the VPS's been up for long, and when also traffic is suddenly much to some of the heavy apps (most are just staging apps anyway).

So, I wish to be able to kill/restart the offending process(es) - most likely Apache. Which solution when done manually in these situations has restored sane memory usage levels - a hint that possibly one or more of the Django apps has a memory leak?


In brief:

  1. Monitor overall system RAM usage
  2. When FREE RAM falls below a given critical threshold (say below 10%), kill/restart the offending process(es) - or simpler, if we assume from my current log analysis (using linux-dash) that Apache is often the offender, then kill/restart it.
  3. Rinse and repeat...

Solution 1:

The linux kernel has a so-called OOM Killer built-in. It is the "Out of memory killer". So when your box has exhausted its ram & swap, the kernel will start killing stuff to make the server accessible.

You can tweak the priorities of processes, to determine the "likelihood" of a process being killed. Read more at this link, see section "Configuring the OOM Killer".

Basically, you adjust the likelihood in the /proc/*/oom_adj file. Eg. raise the likelihood of killing any of the currently running apache instances?

pgrep apache2 |sudo xargs -I %PID sh -c 'echo 10 > /proc/%PID/oom_adj'

Or lower the likelihood that SSH will get killed:

pgrep sshd |sudo xargs -I %PID sh -c 'echo -17 > /proc/%PID/oom_adj'

Also i recommend completely disabling swap on a server where you have this issue, because swap is so slow that it can grind the server to a virtual standstill, even though theres still swapspace left, thus never triggering the OOM killer.

Solution 2:

If these apps are running inside an apache2 server, you can tune the server. Consider:

  • Limit the MaxRequestWorkers (This limits number of workers using memory).
  • Limit the MaxConnectionsPerChild (This recycles servers so that they don't consume to much memory. This is useful if the applications are leaking memory.

If your processes are leaking memory, you can use /etc/security/limits.conf to limit the amount of memory a server can contain. This will prevent servers from growing too large. The same effect can be achieved on a temporary basis using the ulimit command. It may be best to use ulimit to discover an appropriate size, and then set those values in the limits.conf file. If your server supports it, drop a file into /etc/security/limits.d rather than editing /etc/security/limits.conf.