Why does systemd hang during reboot?
1 of 10 times, systemd hangs during reboot. I don't understand the reason. What/where should I look at to fix the problem? I am using systemd v196 and cannot upgrade it to version >=198 because the latter requires a recent kernel (with support for cgroups), which cannot be updated by customer requirements. I wonder if there is a reasonable way to discover the reason of this behaviour and make the systemd reboot the system unconditionally.
Note that this link does not help: http://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1
As you can read there:
Shutdown Never Finishes
If normal reboot or poweroff never finish even after waiting a few minutes, the above method to create the shutdown log will not help and the log must be obtained using other methods. Two options that are useful for debugging boot problems can be used also for shutdown problems:
use a serial console use a debug shell - not only is it available from early boot, it also stays active until late shutdown.
I am using the serial console , and for some reason I can even login, as the eth interface his up or has been brought up (after a disconnection happened during the reboot steps).
I don't see the reason.
# cat /etc/systemd/system/
basic.target.wants/ getty.target.wants/ multi-user.target.wants/ sysinit.target.wants/
dbus-org.freedesktop.NetworkManager.service local-fs-pre.target.wants/ sockets.target.wants/ syslog.service
display-manager.service local-fs.target.wants/ swap.target
Note the swap.target . It's there but we don't use swap partitions at all. I tried to mask swap, but the hang problem reamins. The last line in the console is:
[OK] Stopped target shutdown.
EDIT: As I said, I can re-login via ssh over eth.
Now I will show you two logs. The first log happens when the reboot/shutdwon hangs, while the second log is when rebooting succeeds:
Hang case, the output is always like this (full log):
[ OK ] Stopped Network Time Service (one-shot ntpdate mode).
Stopping Modem and VPN connections autoconnect...
Stopping Login Service...
Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[ OK ] Stopped Monitoring free system resources.
[ OK ] Stopped Monitoring dropbear socket.
[ OK ] Stopped Login Service.
[ OK ] Stopped Modem and VPN c[ OK ] Stopped Getty on tty1.
[ OK ] Stopped Serial Getty on ttyO0.
[ OK ] Unmounted /var/lib/opkg.
[ OK ] Stopped Network Manager.
[ OK ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
Stopping D-Bus System Message Bus...
[ OK ] Stopped target Remote File Systems.
[ OK ] Stopped Suspend manager.
Stopping X Server...
[ OK ] Stopped X Server.
Stopping System Logging Service...
[ OK ] Stopped System Logging Service.
[ 77.580000] g_ether gadget: using random self ethernet address
[ 77.580000] g_ether gadget: using random host ethernet address
[ 77.590000] usb0: MAC 6e:0d:de:b0:33:4f
[ 77.590000] usb0: HOST MAC 62:7a:81:02:f3:ff
[ 77.600000] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[ 77.600000] g_ether gadget: g_ether ready
[ 77.610000] musb-hdrc musb-hdrc.0: MUSB HDRC host driver
[ 77.610000] musb-hdrc musb-hdrc.0: new USB bus registered, assigned bus number 2
[ 77.620000] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[ 77.630000] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 77.640000] usb usb2: Product: MUSB HDRC host driver
[ 77.640000] usb usb2: Manufacturer: Linux 2.6.37 musb-hcd
[ 77.650000] usb usb2: SerialNumber: musb-hdrc.0
[ 77.650000] hub 2-0:1.0: USB hub found
[ 77.660000] hub 2-0:1.0: 1 port detected
[ 77.690000] ADDRCONF(NETDEV_UP): usb0: link is not ready
[ OK ] Stopped target Reboot.
[ OK ] Stopped Reboot.
[ OK ] Stopped target Unmount All Filesystems.
[ OK ] Stopped target Shutdown.
[ 78.330000] <46>systemd-journald[328]: Received SIGUSR1
<hang>
Normal reboot:
Unmounting /var/lib/opkg...
[ OK ] Stopped target Network.
Stopping SSH Per-Connection Server...
[ OK ] Stopped target Graphical Interface.
[ OK ] Stopped target Multi-User.
Stopping Monitoring free system resources...
Stopping Monitoring dropbear socket...
Stopping Network Time Service (one-shot ntpdate mode)...
[ OK ] Stopped Network Time Service (one-shot ntpdate mode).
Stopping Modem and VPN connections autoconnect...
Stopping Login Service...
Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[ OK ] Stopped Monitoring free system resources.
[ OK ] Stopped Monitoring dropbear socket.
[ OK ] Stopped Login Service.
[ OK ] Unmounted /var/lib/opkg.
Stopping Network Manager...
[ OK ] Stopped Getty on tty1.
[ OK ] Stopped Network Manager.
[ OK ] Stopped Serial Getty on ttyO0.
[ OK ] Stopped Suspend manager.
[ OK ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
Stopping D-Bus System Message Bus...
Stopping X Server...
Stopping Permit User Sessions...
[ OK ] Stopped Permit User Sessions.
[ OK ] Stopped target Remote File Systems.
[ OK ] Stopped X Server.
[ OK ] Stopped D-Bus System Message Bus.
Stopping System Logging Service...
[ OK ] Stopped System Logging Service.
[ OK ] Stopped target Basic System.
[ OK ] Stopped target Sockets.
[ OK ] Closed dropbear.socket.
[ OK ] Closed D-Bus System Message Bus Socket.
[ OK ] Stopped target System Initialization.
Stopping Import configuration from SD card...
[ OK ] Stopped Import configuration from SD card.
Stopping Load Kernel Modules...
Stopping Apply Kernel Variables...
[ OK ] Stopped Apply Kernel Variables.
[ OK ] Stopped target Local File Systems.
Unmounting /var...
Unmounting /tmp...
[ OK ] Closed Syslog Socket.
[ OK ] Failed unmounting /var.
[ OK ] Unmounted /tmp.
[ OK ] Stopped Load Kernel Modules.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped target Local File Systems (Pre).
Stopping Remount Root and Kernel File Systems...
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Reached target Shutdown.
[ 52.340000] omap_wdt: Unexpected close, not stopping!
Sending SIGTERM to remaining processes...
[ 52.490000] <46>systemd-journald[335]: Received SIGTERM
Sending SIGKILL to remaining processes...
Unmounting file systems.
Unmounting /sys/fs/fuse/connections.
Unmounting /var.
All filesystems unmounted.
Deactivating swaps.
All swaps deactivated.
UPDATE:
After some investigations and debug, I discovered the reason of the shutdown interruption, although I cannot still solve it. What happens is that for some reasons one of the custom services is started before the shutdown completes, which makes the shutdown procedure hang. That is one case of hang. Another kind of hang is when the shutdown is not interrupted but it stops at some point. For this reasons, before solving all the conflicts and other possible hangs one at a time, I want to unconditionally activate the hardware watchdog. To do this via systemd, I have enabled and tested, either separately or together, the RuntimeWatchdogSec and ShutdownWatchdogSec. Unfortunately, they did not help. By looking at the source code, it seems systemd enters in a loop where it still waits for all the fs to be unmounted and other kind of cleanups to be performed before letting the watchdog really be effective (without keeping it alive).
I am stuck. What I ask you is to find a way to either: 1. have the watchdog enabled unconditionally at least starting from the point where the shutdown begins 2. detected and solve all the conflicts in an easy way
The first solution is preferred.
I venture to suggest a solution: try adding
Before=basic.target
to /usr/lib/systemd/system/dbus.service.
I am struck by an oddity, in your logs, that reminds me of an accident I have read about some time ago, in the Arch Linux forums: this system would hang on reboot. The solution was offered as above, on the ground that the hang would be caused by some service trying to talk to d-bus after it was halted:
So by ordering it before the basic.target it not only starts before the basic target is reached, but also ensures that it stays around until after the basic.target is brought down during shutdown.
In your unhealthy log, we see in fact that the Basic System is not stopped, while it is properly stopped in the healthy log.
Should this not work, and considering that you cannot upgrade, have you considered a downgrade?
shutdown.target
conflicts with all other units by default, in order to automatically stop them when the shutdown process starts. This works the other way too – if another unit starts, it causes shutdown.target
to be stopped. So the problem is that something causes something to start during shutdown, which overrides the shutdown process.
This should have been fixed in systemd v198, which makes the shutdown job "irreplaceable".
Maybe the swap is still active when reaching "Target shutdown"; My solution was to force swap deactivation before reboot :
swapoff -a swapoff /dev/md6
after that, the reboot went fine for me without any pause.