Bash pipeline signal propagation - how does it work?

While answering this question, I was unable to fully explain how signals propagate through a pipeline.

Consider the following examples.

Using timeout as the first element in the pipeline

This causes gpg to bail out having caught the SIGTERM that was delivered to cat, by timeout, leaving a broken file.

$ timeout 1 cat /dev/urandom | gpg -er [email protected] > ./myfile.gpg

gpg: Terminated caught ... exiting
Terminated
$ gpg -d < ./myfile.gpg > /dev/null

You need a passphrase to unlock the secret key for
user: "Attie Grande <[email protected]>"
4096-bit RSA key, ID C9AEA6AE, created 2016-12-13 (main key ID 7826F053)

gpg: encrypted with 4096-bit RSA key, ID C9AEA6AE, created 2016-12-13
      "Attie Grande <[email protected]>"
gpg: block_filter 0x145e790: read error (size=14775,a->size=14775)
gpg: block_filter 0x145f110: read error (size=10710,a->size=10710)
gpg: WARNING: encrypted message has been manipulated!
gpg: block_filter: pending bytes!
gpg: block_filter: pending bytes!

Using timeout in the middle of the pipeline

This works as expected - gpg exits cleanly.

$ cat /dev/urandom | timeout 1 cat | gpg -er [email protected] > ./myfile.gpg
$ gpg -qd < ./myfile.gpg > /dev/null

You need a passphrase to unlock the secret key for
user: "Attie Grande <[email protected]>"
4096-bit RSA key, ID C9AEA6AE, created 2016-12-13 (main key ID 7826F053)

Using SIGUSR1 instead of SIGTERM

Again, this works as expected - gpg exits cleanly. I expect because cat quits on SIGUSR1, while gpg ignores it.

$ timeout -sUSR1 1 cat /dev/urandom | gpg -er [email protected] > ./myfile.gpg
$ gpg -qd < ./myfile.gpg > /dev/null

You need a passphrase to unlock the secret key for
user: "Attie Grande <[email protected]>"
4096-bit RSA key, ID C9AEA6AE, created 2016-12-13 (main key ID 7826F053)

Using process substitution

Again, this works - though I hadn't expected it to.

$ gpg -er [email protected] > ./myfile.gpg < <( timeout 1 cat /dev/urandom )
$ gpg -qd < ./myfile.gpg > /dev/null

You need a passphrase to unlock the secret key for
user: "Attie Grande <[email protected]>"
4096-bit RSA key, ID C9AEA6AE, created 2016-12-13 (main key ID 7826F053)

I can only presume that the signal of the first element in the pipeline is propagated through to the rest of the elements in the pipeline (even separating them with timeout cat | cat | gpg fails).

I've had a look for documentation, and had a play with set -e, set -o pipefail but they didn't act as I was expecting.

  • What is actually going on?
  • What are the semantics?
  • Do we have any control over this?
  • Is there a better way than moving the signal-generating-process form the front of the pipeline?

Solution 1:

I can only presume that the signal of the first element in the pipeline is propagated through to the rest of the elements in the pipeline.

As far as I know there's no such propagation. I'm going to answer mainly your first question:

What is actually going on?

Short answer

(This may be somewhat simplified.)

  1. When running a pipe, interactive bash places every process in a process group with PGID (process group ID) equal to the PID (process ID) of the first command.
  2. timeout changes its own PGID to its own PID. This changes nothing if timeout is the first command in the pipe.
  3. timeout sends the signal not only to the underlying command but to its entire process group as well. If timeout is the first command in the pipeline then its process group will still include gpg, therefore gpg will get the signal.

The phenomenon is researched and elaborated below.


Elaboration

1. bash behavior

When running a pipe, interactive bash places every process in a process group with PGID equal to the PID of the first command. You can make your own tests (see Is it possible to get process group ID from /proc?). I haven't researched more complex possibilities (e.g. what if the first "command" is a subshell?), in your case they don't matter. What matters is that gpg in these commands

timeout 1 cat /dev/urandom | gpg -er [email protected] > ./myfile.gpg
cat /dev/urandom | timeout 1 cat | gpg -er [email protected] > ./myfile.gpg
timeout -sUSR1 1 cat /dev/urandom | gpg -er [email protected] > ./myfile.gpg
gpg -er [email protected] > ./myfile.gpg < <( timeout 1 cat /dev/urandom )

gets PGID equal to the PID of

  • timeout
  • (the first) cat
  • timeout
  • gpg (i.e. itself)

respectively.

2. timeout changes its own PGID (or not)

Run strace timeout 1 cat and you will see among other things:

setpgid(0, 0)

An excerpt from man 2 setpgid:

int setpgid(pid_t pid, pid_t pgid);

setpgid() sets the PGID of the process specified by pid to pgid. If pid is zero, then the process ID of the calling process is used. If pgid is zero, then the PGID of the process specified by pid is made the same as its process ID.

This means timeout sets its PGID equal to its PID. There are two possibilities:

  • if timeout is the first command, its PGID is the same before and after setpgid, so gpg still has the same PGID as timeout;
  • if timeout is not the first command, its PGID is changed and even if gpg had initially the same PGID as timeout the two PGIDs are different now.

3. timeout sends more signals than you expected

The same strace timeout 1 cat reveals lines like:

kill(19401, SIGTERM)
…
kill(0, SIGTERM)

In this example 19401 is the PID of cat. If you used -s USR1 then there will be SIGUSR1 instead of SIGTERM etc. This second kill is responsible for what you thought was a signal propagation through the pipeline. See man 2 kill (excerpt):

int kill(pid_t pid, int sig);

If pid equals 0, then sig is sent to every process in the process group of the calling process.

The calling process is timeout. It sends signals to its entire process group. I admit I don't know what the purpose behind this is, still it does.

So if timeout is the first command in the pipeline then the chosen signal will be sent to every part of it (well, almost; consider another timeout in the same pipeline). This includes gpg. Then it's up to gpg how it reacts to the signal.


Other questions

Do we have any control over this? Is there a better way than moving the signal-generating-process from the front of the pipeline?

My quick search yielded no common tool to set/change PGID. I think you can write your own program that will call setpgid(2) or so; but now, when we know what is going on, moving timeout from the front of the pipeline seems to be a quite sane approach.

Also note this is because of how timeout behaves. Other signal-generating-processes may not need such workaround.