How to send an email if a systemd service is restarted?

I have a critical application which is run as a service by systemd.

It is set up to restart as soon as there is a failure.

How to send an email if the application restarts?


Solution 1:

First you need two files: an executable for sending the mail and a .service for starting the executable. For this example, the executable is just a shell script using sendmail:

/usr/local/bin/systemd-email:

#!/bin/bash

/usr/bin/sendmail -t <<ERRMAIL
To: $1
From: systemd <root@$HOSTNAME>
Subject: $2
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$(systemctl status --full "$2")
ERRMAIL

Whatever executable you use, it should probably take at least two arguments as this shell script does: the address to send to and the unit file to get the status of. The .service we create will pass these arguments:

/etc/systemd/system/[email protected]:

[Unit]
Description=status email for %i to user

[Service]
Type=oneshot
ExecStart=/usr/local/bin/systemd-email address %i
User=nobody
Group=systemd-journal

Where user is the user being emailed and address is that user's email address. Although the recipient is hard-coded, the unit file to report on is passed as an instance parameter, so this one service can send email for many other units. At this point you can start [email protected] to verify that you can receive the emails.

Then simply edit the service you want emails for and add OnFailure=status-email-user@%n.service to the [Unit] section. %n passes the unit's name to the template.

Source: archlinux wiki: systemd timers MAILTO

Solution 2:

The solution proposed by @gf_ worked well for our situation running clickhouse on CentOS7. Clickhouse crashes somewhat regularly on us so we needed to have it both restarted automatically and be notified when the restart occurred. While it seems a little clunky to add a second service to systemd, this is necessary due to systemd's design.

That being said, this solution, when combined with auto-restarting, stopped working for us when we deployed to CentOS8. This is because systemd v239 shipped in C8 introduced a change to the OnFailure= semantics when combined with a non-default configuration of Restart= (Restart=on-failure in our case). The new OnFailure= behavior only triggers the one-shot service if the restart failed completely, not just after a crash. This newer behavior would happily restart the service, but we would not get the email as OnFailure= was no longer being invoked.

Note our primary expectation: we wanted systemd to restart the process AND send an email notification. The v239 update made our previous solution cited by gf_ not work anymore. Fortunately we were able to get this working.

Our solution is to use ExecStopPost to invoke the email notification script. This works fine, but now a new issue came up: an email notification was sent when the clickhouse service started normally, such as on server startup. While not a big deal, ideally we wanted to get email notifications only on crashes. We were able to achieve this by adding the following code to our email script:

# Don't do anything if the service intentionally stopped successfully. if [ $SERVICE_RESULT == "success" ]; then exit fi

... $SERVICE_RESULT is an environment variable supplied by systemd to the target process of ExecStopPost. By checking for a success result, we assume that this invocation came from a normal startup, or shutdown, and do nothing. On any other value, such as signal, the script would continue on an send an email. The possible values of this variable are stated in the documentation.

Thanks to gf_ for the initial solution. I hope people find my update helpful for CentOS8. Some more links that helped me out:

  1. https://superuser.com/questions/1360346/how-to-send-an-email-alert-when-a-linux-service-has-stopped
  2. https://unix.stackexchange.com/questions/422933/confusing-systemd-behaviour-with-onfailure-and-restart
  3. https://unix.stackexchange.com/questions/197636/run-an-arbitrary-command-when-a-service-fails

Solution 3:

You may try to use systemd service option ExecStartPost.

The description is available here:

https://www.freedesktop.org/software/systemd/man/systemd.service.html

There can be more declarations of this option in the service definition file. It's triggered one by one.

You'll have some examples in your system as well.