Systemd becomes unresponsive
Within two months, on two of my Ubuntu 16.04LTS servers systemd
has suddenly become unresponsive.
Symptoms:
- All
systemctl
commands for controlling services or accessing logs fail with error messages:
Failed to retrieve unit state: Connection timed out
Failed to get properties: Connection timed out
-
systemd
does not heed the signal fromlogrotate
for reopening its log, continuing to write to the renamed log file/var/log/syslog.1
while the newly created/var/log/syslog
remains empty. - Lots of zombie processes accumulating from cronjobs and system management tasks.
- Running services continue to run normally but starting or stopping services is no longer possible as even the legacy scripts in
/etc/init.d
redirect to the non-functionalsystemctl
. - Nothing unusual in the logs except the
Connection timed out
messages from attempted interactions withsystemd
.
The commonly proposed corrective measures:
systemctl daemon-reexec
kill -TERM 1
- removing
/run/systemd/system/session-*.scope.d
do not fix the problem. The only remedy is to reboot the entire system, which is of course both disruptive and problematic for a server on the other side of the globe.
Questions:
- What are possible causes for that sort of
systemd
malfunction? - How can I diagnose this further?
- Is there a less disruptive way to recover from an unresponsive
systemd
than to reboot?
Solution 1:
this is a very old question, but I hope it can save someone else time.
I had a identical problem, some zombies and systemctl respond any request with a timeout. As expected the problem was to remove the daemons. At least on our case the solution was:
telinit u
systemctl daemon-reexec
systemctl daemon-reload