Configure buggy systemd service to terminate via SIGKILL

Background

I've been asked to create a systemd script for a new service, foo_daemon, that sometimes gets into a "bad state", and won't die via SIGTERM (likely due to custom signal handler). This is problematic for developers, as they are instructed to start/stop/restart the service via:

  • systemctl start foo_daemon.service
  • systemctl stop foo_daemon.service
  • systemctl restart foo_daemon.service

Problem

Sometimes, due to foo_daemon getting into a bad state, we have to forcibly kill it via:

  • systemctl kill -s KILL foo_daemon.service

Question

How can I setup my systemd script for foo_daemon so that, whenever a user attempts to stop/restart the service, systemd will:

  • Attempt a graceful shutdown of foo_daemon via SIGTERM.
  • Give up to 2 seconds for shutdown/termination of foo_daemon to complete.
  • Attempt a forced shutdown of foo_daemon via SIGKILL if the process is still alive (so we don't have a risk of the PID being recycled and systemd issues SIGKILL against the wrong PID). The device we're testing spawns/forks numerous processes rapidly, so there is a rare but very real concern about PID recycling causing a problem.
  • If, in practise, I'm just being paranoid about PID recycling, I'm OK with the script just issuing SIGKILL against the process' PID without being concerned about killing a recycled PID.


systemd already supports this out of the box, and it is enabled by default.

The only thing you might want to customize is the timeout, which you can do with TimeoutStopSec=. For example:

[Service]
TimeoutStopSec=2

Now, systemd will send a SIGTERM, wait two seconds for the service to exit, and if it doesn't, it will send a SIGKILL.

If your service is not systemd-aware, you may need to provide the path to its PID file with PIDFile=.

Finally, you mentioned that your daemon spawns many processes. In this case, you might wish to set KillMode=control-group and systemd will send signals to all of the processes in the cgroup.


Since nobody mentioned needing Type=oneshot, here's a complete example which exits because of a timeout failure.

[Unit]
Description=timeout test

[Service]
Type=oneshot
TimeoutStartSec=2
ExecStart=/bin/sleep 10