Forensic Analysis of the OOM-Killer

I'm new to ServerFault and just saw this post. It seems to have resurfaced near the front of the queue even though it is old. Let's put this scary one to bed maybe?

First of all, I have an interest in this topic as I am optimizing systems with limited RAM to run many user processes in a secure way.

It is my opinion that the error messages in this log are referring to OpenVZ Linux containers.

A "ve" is a virtual environment and also known as a container in OpenVZ. Each container is given an ID and the number you are seeing is that ID. More on this here:

https://openvz.org/Container

The term "free" refers to free memory in bytes at that moment in time. You can see the free memory gradually increasing as processes are killed.

The term "gen" I am a little unsure of. I believe this refers to generation. That is, it starts out at 1 and increases by one for every generation of a process in a container. So, for your system, it seems there were 24K+ processes executed since boot. Please correct me if I'm wrong. That should be easy to test.

As to why it killed processes, that's because of your OOM killer configuration. It's trying to bring the free memory back to the expected amount (which looks to be 128 Kb). Oracle has a good write-up of how-to configure this to something you might like better:

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

Additionally, if you'd like to see the memory configuration for each of your containers, check this out:

https://openvz.org/Setting_UBC_parameters