Why OOM-killer invoked even when huge amount of RAM is free
On EC2 ebs-backed xlarge ubuntu instance, oom-killer is getting invoked. From /var/log/syslog output below, it appears that ZONE_NORMAL is running out of memory:
Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
but why is ZONE_NORMAL allocated only 11MB of 15GB total RAM? Or is there some other reason for running out of memory?
RAM available on machine (xlarge instance) is 15GB. Sum of rss column in the log below is 3.7GB and that of total_vm is 11.4GB.
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456146] node invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456153] node cpuset=/ mems_allowed=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456157] Pid: 639, comm: node Not tainted 2.6.38-1-virtual #28-Ubuntu
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456160] Call Trace:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456170] [<ffffffff810b259d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456189] [<ffffffff81106a01>] ? dump_header+0x91/0x1e0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456196] [<ffffffff81007e7f>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456203] [<ffffffff815c92de>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456208] [<ffffffff8110705d>] ? oom_kill_process+0x8d/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456213] [<ffffffff81107492>] ? out_of_memory+0x102/0x240
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456218] [<ffffffff8110c209>] ? __alloc_pages_nodemask+0x709/0x7f0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456224] [<ffffffff81105c78>] ? filemap_fault+0x398/0x490
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456230] [<ffffffff811414bd>] ? alloc_pages_current+0x9d/0x110
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456235] [<ffffffff8104338b>] ? pte_alloc_one+0x1b/0x50
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456239] [<ffffffff8100530e>] ? xen_pud_val+0xe/0x10
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456245] [<ffffffff811255c5>] ? __pte_alloc+0x35/0x100
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456250] [<ffffffff81129559>] ? handle_mm_fault+0x129/0x250
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456257] [<ffffffff815ccf82>] ? do_page_fault+0x1a2/0x540
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456262] [<ffffffff8109aacb>] ? do_futex+0xbb/0x210
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456266] [<ffffffff8109ac9b>] ? sys_futex+0x7b/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456274] [<ffffffff814c2975>] ? sys_recvmsg+0x75/0x90
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456278] [<ffffffff815c9a15>] ? page_fault+0x25/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456282] Mem-Info:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456284] Node 0 DMA per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456288] CPU 0: hi: 0, btch: 1 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456290] CPU 1: hi: 0, btch: 1 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456293] CPU 2: hi: 0, btch: 1 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456295] CPU 3: hi: 0, btch: 1 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456299] Node 0 DMA32 per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456303] CPU 0: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456307] CPU 1: hi: 186, btch: 31 usd: 81
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456310] CPU 2: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456314] CPU 3: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456317] Node 0 Normal per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456321] CPU 0: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456323] CPU 1: hi: 186, btch: 31 usd: 175
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456326] CPU 2: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456329] CPU 3: hi: 186, btch: 31 usd: 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456334] active_anon:3726183 inactive_anon:44 isolated_anon:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456335] active_file:214 inactive_file:237 isolated_file:64
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456336] unevictable:0 dirty:1 writeback:0 unstable:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456337] free:18955 slab_reclaimable:2811 slab_unreclaimable:7277
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456338] mapped:256 shmem:101 pagetables:12634 bounce:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456341] Node 0 DMA free:15880kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456353] lowmem_reserve[]: 0 4024 15142 15142
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456360] Node 0 DMA32 free:48596kB min:4180kB low:5224kB high:6268kB active_anon:3968448kB inactive_anon:32kB active_file:168kB inactive_file:208kB unevic
table:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:328kB shmem:80kB slab_reclaimable:2208kB slab_unreclaimable:8604kB kernel_stack:48kB pagetabl
es:9724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:601 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456371] lowmem_reserve[]: 0 0 11117 11117
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456377] Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11384720kB mlocked:0kB dirty:4kB writeback:0kB mapped:696kB shmem:324kB slab_reclaimable:9036kB slab_unreclaimable:20504kB kernel_stack:20
16kB pagetables:40812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2190 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456388] lowmem_reserve[]: 0 0 0 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456393] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456409] Node 0 DMA32: 715*4kB 525*8kB 343*16kB 245*32kB 147*64kB 50*128kB 16*256kB 5*512kB 2*1024kB 0*2048kB 1*4096kB = 48996kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456423] Node 0 Normal: 1919*4kB 5*8kB 2*16kB 2*32kB 1*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 11588kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456438] 768 total pagecache pages
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456440] 0 pages in swap cache
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456443] Swap cache stats: add 0, delete 0, find 0/0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456445] Free swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456447] Total swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495285] 3934192 pages RAM
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495290] 91275 pages reserved
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495291] 1661 pages shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495293] 3822220 pages non-shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495296] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495310] [ 228] 0 228 4297 71 2 0 0 upstart-udev-br
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495315] [ 242] 0 242 5353 123 2 -17 -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495319] [ 319] 0 319 5352 126 2 -17 -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495323] [ 320] 0 320 5352 126 0 -17 -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495327] [ 392] 0 392 1771 124 2 0 0 dhclient
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495332] [ 467] 0 467 12366 140 3 -17 -1000 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495336] [ 474] 101 474 47476 160 2 0 0 rsyslogd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495340] [ 502] 102 502 5927 108 2 0 0 dbus-daemon
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495344] [ 534] 0 534 1035 27 3 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495348] [ 539] 0 539 1035 27 0 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495352] [ 543] 0 543 13589 2510 1 0 0 supervisord
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495356] [ 544] 0 544 1035 25 2 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495360] [ 545] 0 545 1035 27 2 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495364] [ 547] 0 547 1035 27 2 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495368] [ 552] 0 552 5302 65 1 0 0 cron
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495372] [ 553] 0 553 4753 44 1 0 0 atd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495376] [ 565] 0 565 3946 46 2 0 0 irqbalance
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495380] [ 571] 106 571 35760 3859 1 0 0 memcached
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495384] [ 637] 1000 637 222128 36427 1 0 0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495388] [ 639] 1000 639 189634 4396 1 0 0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495392] [ 696] 0 696 1035 27 3 0 0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495396] [ 755] 0 755 31312 372 2 0 0 console-kit-dae
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495401] [ 2404] 0 2404 35058 3644 1 0 0 carbon-cache.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495405] [ 2601] 1000 2601 5982 199 1 0 0 screen
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495409] [ 2602] 1000 2602 5576 766 0 0 0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495413] [ 2661] 0 2661 6801 67 1 0 0 sudo
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495417] [ 2663] 0 2663 7133 851 1 0 0 run-graphite-de
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495421] [ 2668] 0 2668 1056 25 1 0 0 sh
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495425] [ 2669] 0 2669 12906 3942 1 0 0 django-admin.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495429] [ 2670] 0 2670 50313 11546 1 0 0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495433] [ 2785] 0 2785 15953 489 1 0 0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495437] [ 3123] 1000 3123 1184911 266122 1 0 0 java
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495443] [18135] 0 18135 5884532 9128 1 0 0 mongod
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495449] [12574] 1000 12574 966781 932357 2 0 0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495455] [30139] 1000 30139 759010 741642 1 0 0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495461] [14186] 33 14186 19905 4657 1 0 0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495465] [14187] 33 14187 19726 4527 1 0 0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495470] [14190] 1000 14190 23767 8459 1 0 0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495474] [14201] 1000 14201 1706261 1687237 1 0 0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495478] [24553] 0 24553 19831 200 1 0 0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495482] [24584] 1000 24584 19831 200 1 0 0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495486] [24585] 1000 24585 6464 1655 1 0 0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495490] Out of memory: Kill process 14201 (python) score 439 or sacrifice child
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495509] Killed process 14201 (python) total-vm:6825044kB, anon-rss:6748848kB, file-rss:100kB
Thanks for your help!
Looks like pretty legitimate OOM for me:
- You have one NUMA node(namely Node 0) which has three zones:
-
DMA
which has 15880kB free -
DMA32
which has 48996kB free -
Normal
which has 11588kB free
-
- Almost all of your memory
3968448kB + 10936284kB
is inactive_anon
LRU list. - Memory sizes in table are in pages (which on your architecture most likely are 4kb) so they sum up to 14.8Gb, not 3.7.