Kill process depending on its stdout output?

Let's say I'm running a process and it does a very lengthy procedure, printing its progress to the stdout. Is there a way to terminate the process automatically after

  • x lines of output
    or
  • a certain keyword has been found in the output?

I'm currently already piping the output of command x to egrep 'search pattern and want to terminate x after egrep shows a particular line.

I imagine if there were some way to script:

run command `y` after `x` lines of output are seen on previous apps piped `sdout`

I could pull it off fairly easily, for example:

mylongrunningtool | egrep '(term1|term2)' | runafterxlines --lines=8 --command='killall -9 mylongrunnigtool`

Any takers?


Solution 1:

Try the head command:

HEAD(1)                          User Commands                         HEAD(1)

NAME
       head - output the first part of files

SYNOPSIS
       head [OPTION]... [FILE]...

DESCRIPTION
       Print  the  first  10 lines of each FILE to standard output.  With more
       than one FILE, precede each with a header giving the file  name.   With
       no FILE, or when FILE is -, read standard input.

head allows you to specify the number of lines. Refer to the man page for more info.

loop.py:

#!/usr/bin/python`

i = 0
while True:
    print "This is line " + str(i)
    i += 1

loop.py should run infinitely, but if I pipe its output to head, I get:

$ ./loop.py | head 
This is line 0
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
Traceback (most recent call last):
  File "./loop.py", line 6, in <module>
    print "This is line " + str(i)
IOError: [Errno 32] Broken pipe

Note that the error (Traceback ...) portion is actually stderr, as demonstrated by running ./loop.py 2> stderr.log | head, so you need not worry about grepping the output of head.

Finally, to search:

$ ./loop.py 2> /dev/null | head | grep -n "line 6" 
7:This is line 6

Here, I've redirected stderr of loop.py out of the way even though we're certain it won't interfere with the text processed by head and grep

EDIT

TL;DR: The CPU scheduler controls just how much the intensive process will run after head completes its output.

After some testing, I found that my solution, though it does cut the execution of loop.py, isn't as robust as one can make it. With these modifications to my loop.py, piping its output to head yields:

new loop.py:

#!/usr/bin/env python

import sys

def doSomethingIntensive():
    # actually do something intensive here
    # that doesn't print to stdout
    pass

i = 0
while True:
    # printing to stderr so output is not piped 
    print >> sys.stderr, (
            "Starting some calculation that " 
            "doesn't print to stdout")
    doSomethingIntensive()
    print >> sys.stderr, "About to print line " + str(i)
    print "This is line " + str(i)
    print >> sys.stderr, "Finished printing line " + str(i)
    i += 1

and the output:

$ ./loop.py | head
Starting some calculation that doesn't print to stdout
About to print line 0
Finished printing line 0
Starting some calculation that doesn't print to stdout
About to print line 1
Finished printing line 1
Starting some calculation that doesn't print to stdout
About to print line 2
Finished printing line 2
...
About to print line 247
Finished printing line 247This is line 0
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9

Starting some calculation that doesn't print to stdout
About to print line 248
Finished printing line 248
... 
About to print line 487
Finished printing line 487
Starting some calculation that doesn't print to stdout
About to print line 488
Traceback (most recent call last):
  File "./loop.py", line 18, in <module>
    print "This is line " + str(i)
IOError: [Errno 32] Broken pipe

I've hidden some of the output and left only the relevant parts. In essence, the output shows that head's (and I guess all processes') standard input/output streams are buffered.

According to this answer on SO, once the receiver (head) terminates, the pipe gets broken, and *only when the sender (loop.py) attempts to write to the now-broken pipe* will a SIGPIPE signal be sent to it.

So when head got the chance to print its output, all of it showed up at once, but only after loop.py continued for another 247 lines. (This has to do with the scheduling of processes.) Moreover, after head had printed its output but before it terminated, the scheduler resumed loop.py, so another ~250 lines (up to 488) were written to the pipe before the pipe was broken.

For better results, we can use unbuffered I/O (in this case, unbuffered output of loop.py). By invoking the python interpreter with the -u option, we get:

$ python -u loop.py | head
Starting some calculation that doesn't print to stdout
About to print line 0
Finished printing line 0This is line 0

Starting some calculation that doesn't print to stdout
About to print line 1
Finished printing line 1This is line 1

Starting some calculation that doesn't print to stdout
About to print line 2
Finished printing line 2This is line 2

Starting some calculation that doesn't print to stdout
About to print line 3
Finished printing line 3This is line 3

Starting some calculation that doesn't print to stdout
About to print line 4
Finished printing line 4This is line 4

Starting some calculation that doesn't print to stdout
About to print line 5
Finished printing line 5This is line 5

Starting some calculation that doesn't print to stdout
About to print line 6
Finished printing line 6This is line 6

Starting some calculation that doesn't print to stdout
About to print line 7
Finished printing line 7This is line 7

Starting some calculation that doesn't print to stdout
About to print line 8
Finished printing line 8This is line 8

Starting some calculation that doesn't print to stdout
About to print line 9
Finished printing line 9
This is line 9
Starting some calculation that doesn't print to stdout
About to print line 10
Traceback (most recent call last):
  File "loop.py", line 18, in <module>
    print "This is line " + str(i)
IOError: [Errno 32] Broken pipe

Of course, this is straightforward if your program is written in python, as you don't need to make modifications to the code. However, if it is in C, and you happen to have the source for it, you can use the function setvbuf() in stdio.h to set stdout as unbuffered:

loop.c:

#include <stdio.h>
#include <stdlib.h>
#define TRUE 1

unsigned long factorial(int n)
{
    return (n == 0) ? 1 : n * factorial(n - 1);
}

void doSomethingIntensive(int n)
{
    fprintf(stderr, "%4d: %18ld\n", n, factorial(n));
}

int main()
{
    int i;

    if (!setvbuf(stdout, NULL, _IONBF, 0)) /* the important line */
        fprintf(stderr, "Error setting buffer size.\n");
    for(i=0; TRUE; i++)
    {
        doSomethingIntensive(i);
        printf("This is line %d\n", i);
    }

    return 0;
}

Solution 2:

I believe the grep example in the accepted answer doesn't work as the OP expected (i.e. the process doesn't get killed after "line 6" appears in the output). To kill a process after it gives a specific output, one might use

mylongrunningtool | stdbuf -o0 egrep '(term1|term2)' >&-

Here's how it works:

>&- closes the stdout, so any write attempt will result in an error.

egrep '(term1|term2)' discards all output except the lines containing the keyword, in this example, either "term1" or "term2".

stdbuf -o0 disables output buffering for egrep

Once one of the keywords is encountered in the output of mylongrunningtool, egrep will attept to pass it to stdout and terminate with a write error. As a result, the SIGPIPE will be sent to mylongrunningtool which will kill it in turn.

Disclamer:

Since signals are asynchronous, mylongrunningtool may have a chance to execute some code past the statement which prited the keyword to stdout, and it's essentially impossible to guarantee how much code will get executed. In the worst case, if mylongrunningtool queries a device driver for an operation which lasts for an hour (or forever), it will run for one more hour (or forever) before getting killed.

Also, SIGPIPE can be handled, unlike SIGKILL. This means that mylongrunningtool can just ignore the signal and go on with its work. The default handling of SIGPIPE, however, is to terminate.