Why OOM-killer invoked even when huge amount of RAM is free

On EC2 ebs-backed xlarge ubuntu instance, oom-killer is getting invoked. From /var/log/syslog output below, it appears that ZONE_NORMAL is running out of memory:

Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB

but why is ZONE_NORMAL allocated only 11MB of 15GB total RAM? Or is there some other reason for running out of memory?

RAM available on machine (xlarge instance) is 15GB. Sum of rss column in the log below is 3.7GB and that of total_vm is 11.4GB.

Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456146] node invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456153] node cpuset=/ mems_allowed=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456157] Pid: 639, comm: node Not tainted 2.6.38-1-virtual #28-Ubuntu
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456160] Call Trace:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456170]  [<ffffffff810b259d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456189]  [<ffffffff81106a01>] ? dump_header+0x91/0x1e0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456196]  [<ffffffff81007e7f>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456203]  [<ffffffff815c92de>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456208]  [<ffffffff8110705d>] ? oom_kill_process+0x8d/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456213]  [<ffffffff81107492>] ? out_of_memory+0x102/0x240
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456218]  [<ffffffff8110c209>] ? __alloc_pages_nodemask+0x709/0x7f0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456224]  [<ffffffff81105c78>] ? filemap_fault+0x398/0x490
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456230]  [<ffffffff811414bd>] ? alloc_pages_current+0x9d/0x110
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456235]  [<ffffffff8104338b>] ? pte_alloc_one+0x1b/0x50
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456239]  [<ffffffff8100530e>] ? xen_pud_val+0xe/0x10
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456245]  [<ffffffff811255c5>] ? __pte_alloc+0x35/0x100
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456250]  [<ffffffff81129559>] ? handle_mm_fault+0x129/0x250
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456257]  [<ffffffff815ccf82>] ? do_page_fault+0x1a2/0x540
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456262]  [<ffffffff8109aacb>] ? do_futex+0xbb/0x210
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456266]  [<ffffffff8109ac9b>] ? sys_futex+0x7b/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456274]  [<ffffffff814c2975>] ? sys_recvmsg+0x75/0x90
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456278]  [<ffffffff815c9a15>] ? page_fault+0x25/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456282] Mem-Info:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456284] Node 0 DMA per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456288] CPU    0: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456290] CPU    1: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456293] CPU    2: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456295] CPU    3: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456299] Node 0 DMA32 per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456303] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456307] CPU    1: hi:  186, btch:  31 usd:  81
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456310] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456314] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456317] Node 0 Normal per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456321] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456323] CPU    1: hi:  186, btch:  31 usd: 175
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456326] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456329] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456334] active_anon:3726183 inactive_anon:44 isolated_anon:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456335]  active_file:214 inactive_file:237 isolated_file:64
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456336]  unevictable:0 dirty:1 writeback:0 unstable:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456337]  free:18955 slab_reclaimable:2811 slab_unreclaimable:7277
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456338]  mapped:256 shmem:101 pagetables:12634 bounce:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456341] Node 0 DMA free:15880kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456353] lowmem_reserve[]: 0 4024 15142 15142
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456360] Node 0 DMA32 free:48596kB min:4180kB low:5224kB high:6268kB active_anon:3968448kB inactive_anon:32kB active_file:168kB inactive_file:208kB unevic
table:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:328kB shmem:80kB slab_reclaimable:2208kB slab_unreclaimable:8604kB kernel_stack:48kB pagetabl
es:9724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:601 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456371] lowmem_reserve[]: 0 0 11117 11117
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456377] Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11384720kB mlocked:0kB dirty:4kB writeback:0kB mapped:696kB shmem:324kB slab_reclaimable:9036kB slab_unreclaimable:20504kB kernel_stack:20
16kB pagetables:40812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2190 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456388] lowmem_reserve[]: 0 0 0 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456393] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456409] Node 0 DMA32: 715*4kB 525*8kB 343*16kB 245*32kB 147*64kB 50*128kB 16*256kB 5*512kB 2*1024kB 0*2048kB 1*4096kB = 48996kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456423] Node 0 Normal: 1919*4kB 5*8kB 2*16kB 2*32kB 1*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 11588kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456438] 768 total pagecache pages
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456440] 0 pages in swap cache
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456443] Swap cache stats: add 0, delete 0, find 0/0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456445] Free swap  = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456447] Total swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495285] 3934192 pages RAM
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495290] 91275 pages reserved
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495291] 1661 pages shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495293] 3822220 pages non-shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495296] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495310] [  228]     0   228     4297       71   2       0             0 upstart-udev-br
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495315] [  242]     0   242     5353      123   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495319] [  319]     0   319     5352      126   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495323] [  320]     0   320     5352      126   0     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495327] [  392]     0   392     1771      124   2       0             0 dhclient
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495332] [  467]     0   467    12366      140   3     -17         -1000 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495336] [  474]   101   474    47476      160   2       0             0 rsyslogd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495340] [  502]   102   502     5927      108   2       0             0 dbus-daemon
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495344] [  534]     0   534     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495348] [  539]     0   539     1035       27   0       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495352] [  543]     0   543    13589     2510   1       0             0 supervisord
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495356] [  544]     0   544     1035       25   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495360] [  545]     0   545     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495364] [  547]     0   547     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495368] [  552]     0   552     5302       65   1       0             0 cron
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495372] [  553]     0   553     4753       44   1       0             0 atd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495376] [  565]     0   565     3946       46   2       0             0 irqbalance
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495380] [  571]   106   571    35760     3859   1       0             0 memcached
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495384] [  637]  1000   637   222128    36427   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495388] [  639]  1000   639   189634     4396   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495392] [  696]     0   696     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495396] [  755]     0   755    31312      372   2       0             0 console-kit-dae
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495401] [ 2404]     0  2404    35058     3644   1       0             0 carbon-cache.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495405] [ 2601]  1000  2601     5982      199   1       0             0 screen
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495409] [ 2602]  1000  2602     5576      766   0       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495413] [ 2661]     0  2661     6801       67   1       0             0 sudo
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495417] [ 2663]     0  2663     7133      851   1       0             0 run-graphite-de
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495421] [ 2668]     0  2668     1056       25   1       0             0 sh
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495425] [ 2669]     0  2669    12906     3942   1       0             0 django-admin.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495429] [ 2670]     0  2670    50313    11546   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495433] [ 2785]     0  2785    15953      489   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495437] [ 3123]  1000  3123  1184911   266122   1       0             0 java
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495443] [18135]     0 18135  5884532     9128   1       0             0 mongod
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495449] [12574]  1000 12574   966781   932357   2       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495455] [30139]  1000 30139   759010   741642   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495461] [14186]    33 14186    19905     4657   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495465] [14187]    33 14187    19726     4527   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495470] [14190]  1000 14190    23767     8459   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495474] [14201]  1000 14201  1706261  1687237   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495478] [24553]     0 24553    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495482] [24584]  1000 24584    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495486] [24585]  1000 24585     6464     1655   1       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495490] Out of memory: Kill process 14201 (python) score 439 or sacrifice child
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495509] Killed process 14201 (python) total-vm:6825044kB, anon-rss:6748848kB, file-rss:100kB

Thanks for your help!


Looks like pretty legitimate OOM for me:

  1. You have one NUMA node(namely Node 0) which has three zones:
    1. DMA which has 15880kB free
    2. DMA32 which has 48996kB free
    3. Normal which has 11588kB free
  2. Almost all of your memory 3968448kB + 10936284kB is in active_anon LRU list.
  3. Memory sizes in table are in pages (which on your architecture most likely are 4kb) so they sum up to 14.8Gb, not 3.7.