Kill process depending on its stdout output?
Let's say I'm running a process and it does a very lengthy procedure, printing its progress to the stdout. Is there a way to terminate the process automatically after
- x lines of output
or - a certain keyword has been found in the output?
I'm currently already piping the output of command x
to egrep 'search pattern
and want to terminate x
after egrep
shows a particular line.
I imagine if there were some way to script:
run command `y` after `x` lines of output are seen on previous apps piped `sdout`
I could pull it off fairly easily, for example:
mylongrunningtool | egrep '(term1|term2)' | runafterxlines --lines=8 --command='killall -9 mylongrunnigtool`
Any takers?
Solution 1:
Try the head
command:
HEAD(1) User Commands HEAD(1)
NAME
head - output the first part of files
SYNOPSIS
head [OPTION]... [FILE]...
DESCRIPTION
Print the first 10 lines of each FILE to standard output. With more
than one FILE, precede each with a header giving the file name. With
no FILE, or when FILE is -, read standard input.
head
allows you to specify the number of lines. Refer to the man page for more info.
loop.py
:
#!/usr/bin/python`
i = 0
while True:
print "This is line " + str(i)
i += 1
loop.py
should run infinitely, but if I pipe its output to head
, I get:
$ ./loop.py | head
This is line 0
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
Traceback (most recent call last):
File "./loop.py", line 6, in <module>
print "This is line " + str(i)
IOError: [Errno 32] Broken pipe
Note that the error (Traceback ...
) portion is actually stderr
, as demonstrated by running ./loop.py 2> stderr.log | head
, so you need not worry about grepping the output of head.
Finally, to search:
$ ./loop.py 2> /dev/null | head | grep -n "line 6"
7:This is line 6
Here, I've redirected stderr
of loop.py
out of the way even though we're certain it won't interfere with the text processed by head
and grep
EDIT
TL;DR: The CPU scheduler controls just how much the intensive process will run after head
completes its output.
After some testing, I found that my solution, though it does cut the execution of loop.py
, isn't as robust as one can make it. With these modifications to my loop.py
, piping its output to head yields:
new loop.py
:
#!/usr/bin/env python
import sys
def doSomethingIntensive():
# actually do something intensive here
# that doesn't print to stdout
pass
i = 0
while True:
# printing to stderr so output is not piped
print >> sys.stderr, (
"Starting some calculation that "
"doesn't print to stdout")
doSomethingIntensive()
print >> sys.stderr, "About to print line " + str(i)
print "This is line " + str(i)
print >> sys.stderr, "Finished printing line " + str(i)
i += 1
and the output:
$ ./loop.py | head
Starting some calculation that doesn't print to stdout
About to print line 0
Finished printing line 0
Starting some calculation that doesn't print to stdout
About to print line 1
Finished printing line 1
Starting some calculation that doesn't print to stdout
About to print line 2
Finished printing line 2
...
About to print line 247
Finished printing line 247This is line 0
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
Starting some calculation that doesn't print to stdout
About to print line 248
Finished printing line 248
...
About to print line 487
Finished printing line 487
Starting some calculation that doesn't print to stdout
About to print line 488
Traceback (most recent call last):
File "./loop.py", line 18, in <module>
print "This is line " + str(i)
IOError: [Errno 32] Broken pipe
I've hidden some of the output and left only the relevant parts. In essence, the output shows that head
's (and I guess all processes') standard input/output streams are buffered.
According to this answer on SO, once the receiver (head
) terminates, the pipe gets broken, and *only when the sender (loop.py
) attempts to write to the now-broken pipe* will a SIGPIPE signal be sent to it.
So when head
got the chance to print its output, all of it showed up at once, but only after loop.py
continued for another 247 lines. (This has to do with the scheduling of processes.) Moreover, after head
had printed its output but before it terminated, the scheduler resumed loop.py
, so another ~250 lines (up to 488) were written to the pipe before the pipe was broken.
For better results, we can use unbuffered I/O (in this case, unbuffered output of loop.py
). By invoking the python interpreter with the -u
option, we get:
$ python -u loop.py | head
Starting some calculation that doesn't print to stdout
About to print line 0
Finished printing line 0This is line 0
Starting some calculation that doesn't print to stdout
About to print line 1
Finished printing line 1This is line 1
Starting some calculation that doesn't print to stdout
About to print line 2
Finished printing line 2This is line 2
Starting some calculation that doesn't print to stdout
About to print line 3
Finished printing line 3This is line 3
Starting some calculation that doesn't print to stdout
About to print line 4
Finished printing line 4This is line 4
Starting some calculation that doesn't print to stdout
About to print line 5
Finished printing line 5This is line 5
Starting some calculation that doesn't print to stdout
About to print line 6
Finished printing line 6This is line 6
Starting some calculation that doesn't print to stdout
About to print line 7
Finished printing line 7This is line 7
Starting some calculation that doesn't print to stdout
About to print line 8
Finished printing line 8This is line 8
Starting some calculation that doesn't print to stdout
About to print line 9
Finished printing line 9
This is line 9
Starting some calculation that doesn't print to stdout
About to print line 10
Traceback (most recent call last):
File "loop.py", line 18, in <module>
print "This is line " + str(i)
IOError: [Errno 32] Broken pipe
Of course, this is straightforward if your program is written in python, as you don't need to make modifications to the code. However, if it is in C, and you happen to have the source for it, you can use the function setvbuf()
in stdio.h
to set stdout
as unbuffered:
loop.c
:
#include <stdio.h>
#include <stdlib.h>
#define TRUE 1
unsigned long factorial(int n)
{
return (n == 0) ? 1 : n * factorial(n - 1);
}
void doSomethingIntensive(int n)
{
fprintf(stderr, "%4d: %18ld\n", n, factorial(n));
}
int main()
{
int i;
if (!setvbuf(stdout, NULL, _IONBF, 0)) /* the important line */
fprintf(stderr, "Error setting buffer size.\n");
for(i=0; TRUE; i++)
{
doSomethingIntensive(i);
printf("This is line %d\n", i);
}
return 0;
}
Solution 2:
I believe the grep
example in the accepted answer doesn't work as the OP expected (i.e. the process doesn't get killed after "line 6" appears in the output). To kill a process after it gives a specific output, one might use
mylongrunningtool | stdbuf -o0 egrep '(term1|term2)' >&-
Here's how it works:
>&-
closes the stdout
, so any write attempt will result in an error.
egrep '(term1|term2)'
discards all output except the lines containing the keyword, in this example, either "term1" or "term2".
stdbuf -o0
disables output buffering for egrep
Once one of the keywords is encountered in the output of mylongrunningtool
, egrep
will attept to pass it to stdout
and terminate with a write error. As a result, the SIGPIPE
will be sent to mylongrunningtool
which will kill it in turn.
Disclamer:
Since signals are asynchronous, mylongrunningtool
may have a chance to execute some code past the statement which prited the keyword to stdout
, and it's essentially impossible to guarantee how much code will get executed. In the worst case, if mylongrunningtool
queries a device driver for an operation which lasts for an hour (or forever), it will run for one more hour (or forever) before getting killed.
Also, SIGPIPE
can be handled, unlike SIGKILL
. This means that mylongrunningtool
can just ignore the signal and go on with its work. The default handling of SIGPIPE
, however, is to terminate.