How to prevent so many instances of apt-check running?
I have an Ubuntu 12.04 server that just crashed because of a very obvious cause: 30+ of apt-check
processes consuming all memory, the OOM killer kicking in, killing vital services. I'm not sure where the apt-check
processes come from, but I guess my Nagios/Icinga plugins check_apt
might use it, as well as the byobu
status line may want to display its output. I guess something locked up and all of the processes were just waiting, yet holding memory.
How can I prevent to have so many instances of apt-check
on the system? It doesn't make sense to me and it should just quit as soon as it can't get a read lock on the dpkg database.
It seems that I'm not the only one running into trouble here. All suggestions for apt-check
are pretty negative:
(clean browser, not logged in, no personalised search)
Some dive into apt-check
gave me these clues for being it a very blunt script that needs fixing. With all due respect to the authors of it, it is failing on my servers. Here are my thoughts:
-
apt-check
==/usr/lib/update-notifier/apt_check.py
- forces nicelevel 19 for itself
- no timeouts set on actions
The combination of the last two allows it to pile up endlessly in a spiral downwards. If the system is used for some other purposes with higher priority, the amount of processes will just increase and there's no end to it, as apt-check
will never get any priority over it. Trouble will only get worse once the OOM killer decides to kill your vital system processes.
If either of these two aspects in behaviour was different, it would not allow the system to end up in such a broken state is my assumption.
While strings is right about the parent processes being responsible in this too, I believe below points are flaws in apt-check
and has to be reported as a bug to get addressed properly:
- it should hint the OOM killer to have itself killed first
- it should not set the nicelevel hardcoded
- it should exit if it takes an unreasonable amount of time to get pieces of information
Actually, it seems that the Linux OOM killer is doing some heuristic on this. Niced processes will get an increased score, and long-running processes are decreased. (source - thanks to Ulrich Dangel for pointing it out)
Possible solution I may propose:
- cache results after processing
- output cache if less than N amount of seconds without loading all Python-APT libraries for every simple (even
--help
) invocation. - make the nicelevel configurable - Allow me to change/disable this, please! I believe that setting it to 0 will actually help
- have it increase the OOM killer score
You need to find out what process is spawning apt-check. you can use something like ps to get the process tree.
ps -A --forest
If apt-check has no parents, then it might be a issue with apt-check its self and not one particular program. if that is the case I would try to debug apt-check.