How to read oom-killer syslog messages?

The OOM killer suggests that in fact, you've run out of memory.

If you say it's got more memory than it needs then maybe some system event is creating a memory leak somewhere, but the OOM killer will not tell why there is a memory leak, only that it's run out of memory and now tries to kill the least important things (based on oom_score).

And if the case is that there is a memory leak, then maybe the oom-killer will only kill procs so that the rouge one can allocate more and more memory.

So what I would do in case, is

  1. Configure kdump, which will create a crash dump vmcore after a kernel panic. (it's described more here)
  2. Setting vm.panic_on_oom=1 kernel parameter. This will cause a kernel panic should the machine run out of memory.
  3. Next time you get a panic, you can open up the vmcore file created by kdump, and look at the process table, and it will reveal the culprit.

The question is pretty old but reading the log suggests that this is a VFS or some filesystem bug. The system still has 4 GB of free swap and OOM Killer is activated!

The interesting part is at the start:

Oct 25 07:28:04 nldedip4k031 kernel: [87946.529519] Call Trace:
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529525]  [] dump_header.isra.6+0x85/0xc0
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529528]  [] oom_kill_process+0x5c/0x80
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529530]  [] out_of_memory+0xc5/0x1c0
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529532]  [] __alloc_pages_nodemask+0x72c/0x740
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529535]  [] __get_free_pages+0x1c/0x30
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529537]  [] get_zeroed_page+0x12/0x20
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529541]  [] fill_read_buffer.isra.8+0xaa/0xd0
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529543]  [] sysfs_read_file+0x7d/0x90
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529546]  [] vfs_read+0x8c/0x160
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529548]  [] ? fill_read_buffer.isra.8+0xd0/0xd0
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529550]  [] sys_read+0x3d/0x70
Oct 25 07:28:04 nldedip4k031 kernel: [87946.529554]  [] sysenter_do_call+0x12/0x28

So some process was trying to read file (from sysfs?) and the system run out of memory while doing that.

Note that before the OOM Killer lines the system also says free:1304125 so it really doesn't make any sense to start killing processes to execute a filesystem read.

If you're not using any rare filesystems or unstable kernel, I'd guess the hardware has memory corruption. Otherwise, stick to something stable (e.g. ext4 filesystem) and use latest stable kernel available for the distribution you have.