Effects of configuring vm.overcommit_memory
Setting overcommit_ratio
to 80 is likely not the right action. Setting the value to anything less than 100 is almost always incorrect.
The reason for this is that linux applications allocate more than they really need. Say they allocate 8kb to store a couple character string of text. Well thats several KB unused right there. Applications do this a lot, and this is what overcommit is designed for.
So basically with overcommit at 100, the kernel will not allow applications to allocate any more memory than you have (swap + ram). Setting it at less than 100 means that you will never use all your memory. If you are going to set this setting, you should set it higher than 100 because of the fore-mentioned scenario, which is quite common.
However, while setting it greater than 100 is almost always the correct answer, there are some use cases where setting it less than 100 is correct. As mentioned, by doing so you wont be able to use all your memory. However the kernel still can. So you can effectively use this to reserve some memory for the kernel (e.g. the page cache).
Now, as for your issue with the OOM killer triggering, manually setting overcommit will not likely fix this. The default setting (heuristic determination) is fairly intelligent.
If you wish to see if this is really the cause of the issue, look at /proc/meminfo
when the OOM killer runs. If you see that Committed_AS
is close to CommitLimit
, but free
is still showing free memory available, then yes you can manually adjust the overcommit for your scenario. Setting this value too low will cause the OOM killer to start killing applications when you still have plenty of memory free. Setting it too high can cause random applications to die when they try to use memory they were allocated, but isnt actually available (when all the memory does actually get used up).
Section 9.6 "Overcommit and OOM" in the doc that @dunxd mentions is particularly graphic on the dangers of allowing overcommit. However, the 80
looked interesting to me as well, so I conducted a few tests.
What I found is that the overcommit_ratio
affects the total RAM available to ALL processes. Root processes don't seem to be treated differently from normal user processes.
Setting the ratio to 100
or less should provide the classic semantics where return values from malloc/sbrk
are reliable. Setting it ratios lower than 100
might be a way to reserve more RAM for non-process activities like caching and so forth.
So, on my computer with 24 GiB of RAM, with swap disabled, 9 GiB in use, with top
showing
Mem: 24683652k total, 9207532k used, 15476120k free, 19668k buffers
Swap: 0k total, 0k used, 0k free, 241804k cached
Here are some overcommit_ratio
settings and how much RAM my ram-consumer program could grab (touching each page) - in each case the program exited cleanly once malloc
failed.
50 ~680 MiB
60 ~2900 MiB
70 ~5200 MiB
100 ~12000 MiB
Running several at once, even with some as the root user, didn't change the total amount they consumed together. It's interesting that it was unable to consume the last 3+ GiB or so; the free
didn't drop much below what's shown here:
Mem: 24683652k total, 20968212k used, 3715440k free, 20828k buffers
The experiments were messy - anything that uses malloc at the moment all RAM is in use tends to crash, since many programmers are terrible about checking for malloc failures in C, some popular collection libraries ignore it entirely, and C++ and various other languages are even worse.
Most of the early implementations of imaginary RAM I saw were to handle a very specific case, where a single large process - say 51%+ of available memory - needed to fork()
in order to exec()
some support program, usually a much, much smaller one. OSes with copy-on-write semantics would allow the fork()
, but with the proviso that if the forked process actually tried to modify too many memory pages (each of which would then have to be instantiated as an new page independent from the initial huge process) it would end up getting killed. The parent process was only in danger if allocating more memory, and could handle running out, in some cases just by waiting for a bit for some other process to die, and then continuing. The child process usually just replaced itself with a (typically smaller) program via exec()
and was then free of the proviso.
Linux's overcommit concept is an extreme approach to allowing both the fork()
to occur as well as allowing single processes to massively overallocate. OOM-killer-caused deaths happen asynchronously, even to programs that do handle memory allocation responsibly. I personally hate system-wide overcommit in general and the oom-killer in particular - it fosters a devil-may care approach to memory management that infects libraries and through them every app that uses them.
I'd suggest setting the ratio to 100, and having a swap partition as well that would generally only end up getting used by huge processes - which often are only using a tiny fraction of the part of themselves that gets stuffed into swap, and thus protect the vast majority of processes from the OOM killer misfeature. This should keep your webserver safe from random death, and if it was written to handle malloc
responsibly, even safe from killing itself (but don't bet on the latter).
That means I'm using this in /etc/sysctl.d/10-no-overcommit.conf
vm.overcommit_memory = 2
vm.overcommit_ratio = 100