Computer freezing on almost full RAM, possibly disk cache problem

The problem I think is somewhat similar to this thread.

It doesn't matter if I have swap enabled or disabled, whenever the real used RAM amount starts going close to maximum and there is almost no space left for disk cache, system becomes totally unresponsive.

Disk is spinning wildly, and sometimes after long waits 10-30 minutes it will unfreeze, and sometimes not (or I run out of patience). Sometimes if I act quickly I can manage to slowly open console and kill some of ram eating applications like browser, and the system unfreezes almost instantly.

Because of this problem I almost never see anything in the swap, only sometimes there are some few MB there, and then soon after this problem appears. My not so educated guess would be that it is connected somehow to the disk cache being too greedy, or memory management too lenient, so when the memory is needed it is not freed quickly enough and starves the system.

Problem can be achieved really fast if working with large files (500MB+) which are loaded in disk cache and apparently afterwards system is unable to unload them fast enough.

Any help or Ideas will be greatly appreciated.

For now I have to live in constant fear, when doing something computer can just freeze and I usually have to restart It, if it is really running out of ram I would much more like it to just kill some of userspace applications, like browser (preferably if I could somehow mark which to kill first)

Although the mystery is why doesn't swap save me in this situation.

UPDATE: It didn't hang for some time, but now I got several occurrences again. I am now keeping ram monitor on my screen at all times and when the hang happened it still showed ~30% free (Used by disk cache probably). Additional symptoms: If at the time I am watching video (VLC player) the sound stops first, after a few seconds the image stops. While the sound has stopped I still have some control over PC, but when the image stops I cannot even move the mouse anymore, so I restarted it after some waiting. Btw, this didn't happen when I started to watch the video but some time in (20min) and I didn't actively do anything else at the time, even though browser and oowrite were open on the second screen the whole time. Basically something just decides to happen at one point and hangs the system.

As per request in the comments I ran dmesg right after the hang. I didn't notice anything weird, but didn't know for what to look, so here it is: https://docs.google.com/document/d/1iQih0Ee2DwsGd3VuQZu0bPbg0JGjSOCRZhu0B05CMYs/edit?hl=en_US&authkey=CPzF7bcC

Solution 1:

To fix this problem I have found that you need to set the following setting to something around 5%-6% of your total physical RAM, divided by the number of cores in the computer:

sysctl -w vm.min_free_kbytes=65536

Keep in mind that this is a per-core setting, so if I have 2GB RAM and two Cores, then I calculated 6% of only 1 GB and added a little extra just to be safe.

This forces the computer to try to keep this amount of RAM free, and in doing so limits the ability to cache disk files. Of course it still tries to cache them and immediately swap them out, so you should probably limit your swapping as well:

sysctl -w vm.swappiness=5

(100 = swap as often as possible, 0= swap only on total necessity)

The result is that linux no longer randomly decides to load a whole movie file of approx 1GB in ram while watching it, and killing the machine in doing so.

Now there is enough reserved space to avoid memory starvation, which aparrently was the problem (seeing as there are no more freezes like before).

After testing for a day - lockups are gone, sometimes there are minor slowdowns, because stuff gets cached more often, but I can live with that if I dont have to restart computer every few hours.

The lesson here is - default memory management is just one of use cases and is not allways the best, even though some people try to suggest otherwise - home entertainment ubuntu should be configured differently than server.

You probably want to make these settings permanent by adding them to your /etc/sysctl.conf like this:

vm.swappiness=5
vm.min_free_kbytes=65536

Solution 2:

This happened for me in a new install of Ubuntu 14.04.

In my case, it had nothing to do with sysctl issues mentioned.

Instead, the problem was that the swap partition's UUID was different during installation than it was after installation. So my swap was never enabled, and my machine would lock up after a few hours use.

The solution was to check the current UUID of the swap partition with

sudo blkid

and then sudo nano /etc/fstab to replace the incorrect swap's UUID value with the one reported by blkid.

A simple reboot to affect the changes, and voila.

Solution 3:

Nothing worked for me!!

So I wrote a script to monitor memory usage. It will first try to clear RAM cache if the memory consumption increases a threshold. You can configure this threshold on the script. If memory consumption doesn't come below the threshold even then, it will start killing processes on by one in decreasing order of memory consumption until the memory consumption is below the threshold. I have set it to 96% by default. You can configure it by changing the value of variable RAM_USAGE_THRESHOLD in the script.

I agree that killing processes which consume high memory is not the perfect solution, but it's better to kill ONE application instead of losing ALL the work!! the script will send you desktop notification if RAM usage increases the threshold. It will also notify you if it kills any process.

#!/usr/bin/env python
import psutil, time
import tkinter as tk
from subprocess import Popen, PIPE
import tkinter
from tkinter import messagebox
root = tkinter.Tk()
root.withdraw()

RAM_USAGE_THRESHOLD = 96
MAX_NUM_PROCESS_KILL = 100

def main():
    if psutil.virtual_memory().percent >= RAM_USAGE_THRESHOLD:
        # Clear RAM cache
        mem_warn = "Memory usage critical: {}%\nClearing RAM Cache".\
            format(psutil.virtual_memory().percent)
        print(mem_warn)
        Popen("notify-send \"{}\"".format(mem_warn), shell=True)
        print("Clearing RAM Cache")
        print(Popen('echo 1 > /proc/sys/vm/drop_caches',
                    stdout=PIPE, stderr=PIPE,
                    shell=True).communicate())
        post_cache_mssg = "Memory usage after clearing RAM cache: {}%".format(
                            psutil.virtual_memory().percent)
        Popen("notify-send \"{}\"".format(post_cache_mssg), shell=True)
        print(post_cache_mssg)

        if psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD:
            print("Clearing RAM cache saved the day")
            return
        # Kill top C{MAX_NUM_PROCESS_KILL} highest memory consuming processes.
        ps_killed_notify = ""
        for i, ps in enumerate(sorted(psutil.process_iter(),
                                      key=lambda x: x.memory_percent(),
                                      reverse=True)):
            # Do not kill root
            if ps.pid == 1:
                continue
            elif (i > MAX_NUM_PROCESS_KILL) or \
                    (psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD):
                messagebox.showwarning('Killed proccess - save_hang',
                                       ps_killed_notify)
                Popen("notify-send \"{}\"".format(ps_killed_notify), shell=True)
                return
            else:
                try:
                    ps_killed_mssg = "Killed {} {} ({}) which was consuming {" \
                                     "} % memory (memory usage={})". \
                        format(i, ps.name(), ps.pid, ps.memory_percent(),
                               psutil.virtual_memory().percent)
                    ps.kill()
                    time.sleep(1)
                    ps_killed_mssg += "Current memory usage={}".\
                        format(psutil.virtual_memory().percent)
                    print(ps_killed_mssg)
                    ps_killed_notify += ps_killed_mssg + "\n"
                except Exception as err:
                    print("Error while killing {}: {}".format(ps.pid, err))
    else:
        print("Memory usage = " + str(psutil.virtual_memory().percent))
    root.update()


if __name__ == "__main__":
    while True:
        try:
            main()
        except Exception as err:
            print(err)
        time.sleep(1)

Save the code in a file say save_hang.py. Run the script as:

sudo python save_hang.py

Please note that this script is compatible for Python 3 only and requires you to install tkinter package. you can install it as:

sudo apt-get install python3-tk

Hope this helps...

Solution 4:

I know this question is old, but I had this problem in Ubuntu (Chrubuntu) 14.04 on an Acer C720 Chromebook. I tried Krišjānis Nesenbergs solution, and it worked somewhat, but still crashed sometimes.

I finally found a solution that worked by installing zram instead of using physical swap on the SSD. To install it I just followed the instructions here, like this:

sudo apt-get install zram-config

Afterwards I was able to configure the size of the zram swap by modifying /etc/init/zram-config.conf on line 21.

20: # Calculate the memory to user for zram (1/2 of ram)
21: mem=$(((totalmem / 2 / ${NRDEVICES}) * 1024))

I replaced the 2 with a 1 in order to make the zram size the same size as the amount of ram I have. Since doing so, I have had no more freezes or system unresponsiveness.

Solution 5:

My guess is that you've set your vm.swappiness to a very low value, which causes the kernel to swap too late, leaving too low RAM for the system to work with.

You can show your current swappiness setting by executing:

sysctl vm.swappiness

By default, this is set to 60. The Ubuntu Wiki recommends to set it to 10, but feel free to set it to a higher value. You can change it by running:

sudo sysctl vm.swappiness=10

This will change it for the current session only, to make it persistent, you need to add vm.swappiness = 10 to the /etc/sysctl.conf file.

If your disk is slow, consider buying a new one.