What's a reliable technique for killing background processes on script termination?

I use shell scripts to react to system events and update status displays in my window manager. For example, one script determines the current wifi status by listenining to multiple sources:

  1. associate/dissociate events from wpa_supplicant
  2. address changes from ip (so i know when dhcpcd has assigned an address)
  3. a timer process (so the signal strength updates from time to time)

To achieve the multiplexing, I end up spawning background processes:

{ wpa_cli -p /var/run/wpa_supplicant -i wlan0 -a echo &
ip monitor address &
while sleep 30; do echo; done } |
while read line; do update_wifi_status; done &

ie, the setup is that whenever any of the event sources output a line, my wifi status updates. The entire pipeline is run in the background (the final '&') because I also watch another event source that causes my script to terminate:

wait_for_termination
kill $!

The kill is supposed to clean up the background processes, but in this form it doesn't quite do the job. The 'wpa_cli' and 'ip' processes always survive, at least, and nor do they die on their next event (in theory they should get a SIGPIPE; I guess the reading process must still be alive too).

The question is, how to reliably [and elegantly!] clean up all the background processes spawned?


Solution 1:

The super simple solution is to add this at the end of the script:

kill -- -$$

Explanation:

$$ gives us the PID of the running shell. So, kill $$ would send a SIGTERM to the shell process. However, if we negate the PID, kill sends a SIGTERM to every process in the process group. We need the -- beforehand so kill knows that -$$ is a process group ID and not a flag.

Note that this relies on the running shell being a process group leader! Otherwise, $$ (the PID) will not match the process group ID, and you end up sending a signal to who knows where (well, probably nowhere as there is unlikely to be a process group with a matching ID if we're not a group leader).

When the shell starts, it creates a new process group[1]. Every forked process becomes a member of that process group, unless they explicitly change their process group via a syscall (setpgid).

The easiest way to guarantee a particular script runs as a process group leader is to launch it using setsid. For example, I have a few of these status scripts which I launch from a parent script:

#!/bin/sh
wifi_status &
bat_status &

Written like this, both the wifi and battery scripts run with the same process group as the parent script, and kill -- -$$ doesn't work. The fix is:

#!/bin/sh
setsid wifi_status &
setsid bat_status &

I found pstree -p -g useful to visualise process & process group IDs.

Thanks to everyone who contributed and made me dig a little deeper, I learnt stuff! :)

[1] is there other circumstances where the shell creates a process group? eg. on starting a subshell? i don't know...

Solution 2:

OK, I've come up with a pretty decent solution that doesn't use cgroups. It won't work in the face of forking processes, as Leonardo Dagnino pointed out.

One of the problems with manually keeping track of process IDs via $! to kill them later is the inherent race condition - if the process finishes before you kill it, the script will send a signal to a non-existant, or possibly incorrect, process.

We can check for process termination within the shell via the wait builtin, but we can only wait for the termination of either all background processes, or a single pid. In both cases wait blocks, which makes it unsuitable for the task of checking whether a given PID is still running.

In searching for a solution to the above I stumbled upon the jobs command, which I previously thought was only available to interactive shells. Turns out it works fine it scripts, and automatically keeps track of the background processes we've launched - if a process has terminated, it won't show up in the jobs list anymore.

Therefore, the command:

trap 'kill $(jobs -p)' EXIT

is enough to ensure -- in simple cases -- the termination of background processes when the current shell exits.

In my case one is not enough, because I'm launching background process from a subshell aswell, and traps are cleared for each new subshell. So, I need to do the same trap within the subshell:

{ trap 'kill $(jobs -p)' EXIT
wpa_cli -p /var/run/wpa_supplicant -i wlan0 -a echo &
ip monitor address &
while echo; do sleep 30; done } |
while read line; do update_wifi_status; done &

Finally, jobs -p only gives the pid of the last process in the pipeline (just like $!). As you can see, I'm spawning background processes in the first process of the background pipeline, so I want to signal that pid aswell.

The first process' pid can be obtained from jobs, but I'm not sure how portably this can be achieved. Using bash, I get output in this style:

$ sleep 20 | sleep 20 &
$ jobs -l
[1]+ 25180 Running                 sleep 20
     25181                       | sleep 20 &

So, by using a slightly modified kill command from the parent script, I can signal all processes in the pipeline:

wait_for_termination
kill $(jobs -l |awk '$2 == "|" {print $1; next} {print $2}')