Configure buggy systemd service to terminate via SIGKILL
Background
I've been asked to create a systemd
script for a new service, foo_daemon
, that sometimes gets into a "bad state", and won't die via SIGTERM
(likely due to custom signal handler). This is problematic for developers, as they are instructed to start/stop/restart the service via:
systemctl start foo_daemon.service
systemctl stop foo_daemon.service
systemctl restart foo_daemon.service
Problem
Sometimes, due to foo_daemon
getting into a bad state, we have to forcibly kill it via:
systemctl kill -s KILL foo_daemon.service
Question
How can I setup my systemd
script for foo_daemon
so that, whenever a user attempts to stop/restart the service, systemd
will:
- Attempt a graceful shutdown of
foo_daemon
viaSIGTERM
. - Give up to 2 seconds for shutdown/termination of
foo_daemon
to complete. - Attempt a forced shutdown of
foo_daemon
viaSIGKILL
if the process is still alive (so we don't have a risk of the PID being recycled andsystemd
issuesSIGKILL
against the wrong PID). The device we're testing spawns/forks numerous processes rapidly, so there is a rare but very real concern about PID recycling causing a problem. - If, in practise, I'm just being paranoid about PID recycling, I'm OK with the script just issuing
SIGKILL
against the process' PID without being concerned about killing a recycled PID.
systemd already supports this out of the box, and it is enabled by default.
The only thing you might want to customize is the timeout, which you can do with TimeoutStopSec=
. For example:
[Service]
TimeoutStopSec=2
Now, systemd will send a SIGTERM, wait two seconds for the service to exit, and if it doesn't, it will send a SIGKILL.
If your service is not systemd-aware, you may need to provide the path to its PID file with PIDFile=
.
Finally, you mentioned that your daemon spawns many processes. In this case, you might wish to set KillMode=control-group
and systemd will send signals to all of the processes in the cgroup.
Since nobody mentioned needing Type=oneshot
, here's a complete example which exits because of a timeout failure.
[Unit]
Description=timeout test
[Service]
Type=oneshot
TimeoutStartSec=2
ExecStart=/bin/sleep 10