How can you distinguish between a crash and a reboot on RHEL7?

Is there a way to determine whether a RHEL7 server was rebooted via systemctl (or reboot / shutdown aliases), or whether the server crashed? Pre-systemd this was fairly easy to determine with last -x runlevel, but with RHEL7 it's not so clear.

There's more than one way to do this, but I'll cover the 4 best ones I can think of. (EDIT: I published a cleaned-up version of this as a public article on redhat.com. See: How to distinguish between a crash and a graceful reboot in RHEL 7.)

(1) auditd logs

auditd is amazing. You can see all the different events that it logs by checking ausearch -m. Apropos to the problem at hand, it logs system shutdown and system boot, so you can use the command ausearch -i -m system_boot,system_shutdown | tail -4. If this reports a SYSTEM_SHUTDOWN followed by a SYSTEM_BOOT, all is well; however, if it reports 2 SYSTEM_BOOT lines in a row, then clearly the system did not shutdown gracefully, as in the following example:

[root@a72 ~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:10:32.392:7) : pid=657 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:11:41.134:7) : pid=656 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

(2) last -x

Same as above, but with the simple last -n2 -x shutdown reboot command. Example where system crashed:

[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:11 - 01:20  (00:08)    
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:10 - 01:20  (00:09)

Or where system had a graceful reboot:

[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)    
shutdown system down  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)

(3) create your own service unit

This is IMHO the best approach because you can tailor it to whatever you want. There are a million ways to do this. Here's one I just made up. This next service only runs at shutdown.

[root@a72 ~]# cat /etc/systemd/system/set_gracefulshutdown.service
[Unit]
Description=Set flag for graceful shutdown
DefaultDependencies=no
RefuseManualStart=true
Before=shutdown.target

[Service]
Type=oneshot
ExecStart=/bin/touch /root/graceful_shutdown

[Install]
WantedBy=shutdown.target
[root@a72 ~]# systemctl enable set_gracefulshutdown.service 
Created symlink from /etc/systemd/system/shutdown.target.wants/set_gracefulshutdown.service to /etc/systemd/system/set_gracefulshutdown.service.

Then when the system boots, this next service will only start if the file created by the above shutdown service exists.

[root@a72 ~]# cat /etc/systemd/system/check_graceful.service 
[Unit]
Description=Check if system booted after a graceful shutdown
ConditionPathExists=/root/graceful_shutdown
RefuseManualStart=true
RefuseManualStop=true

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/rm /root/graceful_shutdown

[Install]
WantedBy=multi-user.target
[root@a72 ~]# systemctl enable check_graceful
Created symlink from /etc/systemd/system/multi-user.target.wants/check_graceful.service to /etc/systemd/system/check_graceful.service.

So at any given time I can check if the previous boot was done after a graceful shutdown by doing systemctl is-active check_graceful, e.g.:

[root@a72 ~]# systemctl is-active check_graceful && echo YAY || echo OH NOES
active
YAY
[root@a72 ~]# systemctl status check_graceful
● check_graceful.service - Check if system booted after a graceful shutdown
   Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2016-09-20 01:10:32 EDT; 20s ago
  Process: 669 ExecStart=/bin/rm /root/graceful_shutdown (code=exited, status=0/SUCCESS)
 Main PID: 669 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/check_graceful.service

Sep 20 01:10:32 a72.example.com systemd[1]: Starting Check if system booted after a graceful shutdown...
Sep 20 01:10:32 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.

Or here's after an ungraceful shutdown:

[root@a72 ~]# systemctl is-active check_graceful && echo YAY || echo OH NOES
inactive
OH NOES
[root@a72 ~]# systemctl status check_graceful
● check_graceful.service - Check if system booted after a graceful shutdown
   Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
   Active: inactive (dead)
Condition: start condition failed at Tue 2016-09-20 01:11:41 EDT; 16s ago
           ConditionPathExists=/root/graceful_shutdown was not met

Sep 20 01:11:41 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.

(4) journalctl

It is worth mentioning that if you configure systemd-journald to keep a persistent journal, you can then use journalctl -b -1 -n to look at the last few (10 by default) lines of the previous boot (-b -2 is the boot before that, etc). Example where the system rebooted gracefully:

[root@a72 ~]# mkdir /var/log/journal
[root@a72 ~]# systemctl -s SIGUSR1 kill systemd-journald
[root@a72 ~]# reboot
...
[root@a72 ~]# journalctl -b -1 -n
-- Logs begin at Tue 2016-09-20 01:01:15 EDT, end at Tue 2016-09-20 01:21:33 EDT. --
Sep 20 01:21:19 a72.example.com systemd[1]: Stopped Create Static Device Nodes in /dev.
Sep 20 01:21:19 a72.example.com systemd[1]: Stopping Create Static Device Nodes in /dev...
Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Shutdown.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Shutdown.
Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Final Step.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Final Step.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Reboot...
Sep 20 01:21:19 a72.example.com systemd[1]: Shutting down.
Sep 20 01:21:19 a72.example.com systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Sep 20 01:21:19 a72.example.com systemd-journal[483]: Journal stopped

If you get good output like that, then clearly the system was shutdown gracefully. That said, it's not super-reliable in my experience when bad things happen (system crashes). Sometimes the indexing gets weird.

Funny, I just happened to reboot a CentOS 7 system last night, and so I have a nice log of just this to look at.

In the case of a crash, obviously nothing is logged between the time of the crash and the system restart.

In the case of a reboot, it is pretty obvious, as you get a log of (nearly) everything systemd is doing to shut down the system.

One such log entry you aren't likely to see in any circumstance other than shutting down or going to single-user mode is:

Jul 13 01:27:55 yaungol systemd: Stopped target Multi-User System.

You can reboot your own system to see what actually gets logged.

I don't particularly like the answer, but it's an answer we got from RH. I'm posting it here in case it helps someone else.

One possible way is to grep for rsyslogd in /var/log/messages. A graceful shutdown would have exiting on signal 15. A crash would not.

tac /var/log/messages | grep 'rsyslogd.*start\|rsyslogd.*exit'

Two consecutive start lines may indicate a crash. And a start followed by an exit may indicate a reboot.

Unfortunately it could also give bad results if rsyslogd goes down or is restarted outside a reboot/crash.

How can you distinguish between a crash and a reboot on RHEL7?

(1) auditd logs

(2) last -x

(3) create your own service unit

(4) journalctl

Related

Recent Posts