Systemd becomes unresponsive

Within two months, on two of my Ubuntu 16.04LTS servers systemd has suddenly become unresponsive. Symptoms:

All systemctl commands for controlling services or accessing logs fail with error messages:

Failed to retrieve unit state: Connection timed out
Failed to get properties: Connection timed out

systemd does not heed the signal from logrotate for reopening its log, continuing to write to the renamed log file /var/log/syslog.1 while the newly created /var/log/syslog remains empty.
Lots of zombie processes accumulating from cronjobs and system management tasks.
Running services continue to run normally but starting or stopping services is no longer possible as even the legacy scripts in /etc/init.d redirect to the non-functional systemctl.
Nothing unusual in the logs except the Connection timed out messages from attempted interactions with systemd.

The commonly proposed corrective measures:

systemctl daemon-reexec
kill -TERM 1
removing /run/systemd/system/session-*.scope.d

do not fix the problem. The only remedy is to reboot the entire system, which is of course both disruptive and problematic for a server on the other side of the globe.

Questions:

What are possible causes for that sort of systemd malfunction?
How can I diagnose this further?
Is there a less disruptive way to recover from an unresponsive systemd than to reboot?

Solution 1:

this is a very old question, but I hope it can save someone else time.

I had a identical problem, some zombies and systemctl respond any request with a timeout. As expected the problem was to remove the daemons. At least on our case the solution was:

telinit u
systemctl daemon-reexec
systemctl daemon-reload

Systemd becomes unresponsive

Solution 1:

Related

Recent Posts