Run command and get its stdout, stderr separately in near real time like in a terminal

I am trying to find a way in Python to run other programs in such a way that:

  1. The stdout and stderr of the program being run can be logged separately.
  2. The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)
  3. Bonus criteria: The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output). This small criteria pretty much means we will need to use a pty I think.

Here is what i've got so far... Method 1:

def method1(command):
    ## subprocess.communicate() will give us the stdout and stderr sepurately, 
    ## but we will have to wait until the end of command execution to print anything.
    ## This means if the child process hangs, we will never know....
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    stdout, stderr = proc.communicate() # record both, but no way to print stdout/stderr in real-time
    print ' ######### REAL-TIME ######### '
    ########         Not Possible
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDOUT:'
    print stderr

Method 2

def method2(command):
    ## Using pexpect to run our command in a pty, we can see the child's stdout in real-time,
    ## however we cannot see the stderr from "curl google.com", presumably because it is not connected to a pty?
    ## Furthermore, I do not know how to log it beyond writing out to a file (p.logfile). I need the stdout and stderr
    ## as strings, not files on disk! On the upside, pexpect would give alot of extra functionality (if it worked!)
    proc = pexpect.spawn('/bin/bash', ['-c', command])
    print ' ######### REAL-TIME ######### '
    proc.interact()
    print ' ########## RESULTS ########## '
    ########         Not Possible

Method 3:

def method3(command):
    ## This method is very much like method1, and would work exactly as desired
    ## if only proc.xxx.read(1) wouldn't block waiting for something. Which it does. So this is useless.
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    print ' ######### REAL-TIME ######### '
    out,err,outbuf,errbuf = '','','',''
    firstToSpeak = None
    while proc.poll() == None:
            stdout = proc.stdout.read(1) # blocks
            stderr = proc.stderr.read(1) # also blocks
            if firstToSpeak == None:
                if stdout != '': firstToSpeak = 'stdout'; outbuf,errbuf = stdout,stderr
                elif stderr != '': firstToSpeak = 'stderr'; outbuf,errbuf = stdout,stderr
            else:
                if (stdout != '') or (stderr != ''): outbuf += stdout; errbuf += stderr
                else:
                    out += outbuf; err += errbuf;
                    if firstToSpeak == 'stdout': sys.stdout.write(outbuf+errbuf);sys.stdout.flush()
                    else: sys.stdout.write(errbuf+outbuf);sys.stdout.flush()
                    firstToSpeak = None
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print out
    print 'STDERR:'
    print err

To try these methods out, you will need to import sys,subprocess,pexpect

pexpect is pure-python and can be had with

sudo pip install pexpect

I think the solution will involve python's pty module - which is somewhat of a black art that I cannot find anyone who knows how to use. Perhaps SO knows :) As a heads-up, i recommend you use 'curl www.google.com' as a test command, because it prints its status out on stderr for some reason :D


UPDATE-1:
OK so the pty library is not fit for human consumption. The docs, essentially, are the source code. Any presented solution that is blocking and not async is not going to work here. The Threads/Queue method by Padraic Cunningham works great, although adding pty support is not possible - and it's 'dirty' (to quote Freenode's #python). It seems like the only solution fit for production-standard code is using the Twisted framework, which even supports pty as a boolean switch to run processes exactly as if they were invoked from the shell. But adding Twisted into a project requires a total rewrite of all the code. This is a total bummer :/

UPDATE-2:

Two answers were provided, one of which addresses the first two criteria and will work well where you just need both the stdout and stderr using Threads and Queue. The other answer uses select, a non-blocking method for reading file descriptors, and pty, a method to "trick" the spawned process into believing it is running in a real terminal just as if it was run from Bash directly - but may or may not have side-effects. I wish I could accept both answers, because the "correct" method really depends on the situation and why you are subprocessing in the first place, but alas, I could only accept one.


Solution 1:

The stdout and stderr of the program being run can be logged separately.

You can't use pexpect because both stdout and stderr go to the same pty and there is no way to separate them after that.

The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)

If the output of a subprocess is not a tty then it is likely that it uses a block buffering and therefore if it doesn't produce much output then it won't be "real time" e.g., if the buffer is 4K then your parent Python process won't see anything until the child process prints 4K chars and the buffer overflows or it is flushed explicitly (inside the subprocess). This buffer is inside the child process and there are no standard ways to manage it from outside. Here's picture that shows stdio buffers and the pipe buffer for command 1 | command2 shell pipeline:

pipe/stdio buffers

The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output).

It seems, you meant the opposite i.e., it is likely that your child process chunks its output instead of flushing each output line as soon as possible if the output is redirected to a pipe (when you use stdout=PIPE in Python). It means that the default threading or asyncio solutions won't work as is in your case.

There are several options to workaround it:

  • the command may accept a command-line argument such as grep --line-buffered or python -u, to disable block buffering.

  • stdbuf works for some programs i.e., you could run ['stdbuf', '-oL', '-eL'] + command using the threading or asyncio solution above and you should get stdout, stderr separately and lines should appear in near-real time:

    #!/usr/bin/env python3
    import os
    import sys
    from select import select
    from subprocess import Popen, PIPE
    
    with Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],
               stdout=PIPE, stderr=PIPE) as p:
        readable = {
            p.stdout.fileno(): sys.stdout.buffer, # log separately
            p.stderr.fileno(): sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                data = os.read(fd, 1024) # read available
                if not data: # EOF
                    del readable[fd]
                else: 
                    readable[fd].write(data)
                    readable[fd].flush()
    
  • finally, you could try pty + select solution with two ptys:

    #!/usr/bin/env python3
    import errno
    import os
    import pty
    import sys
    from select import select
    from subprocess import Popen
    
    masters, slaves = zip(pty.openpty(), pty.openpty())
    with Popen([sys.executable, '-c', r'''import sys, time
    print('stdout', 1) # no explicit flush
    time.sleep(.5)
    print('stderr', 2, file=sys.stderr)
    time.sleep(.5)
    print('stdout', 3)
    time.sleep(.5)
    print('stderr', 4, file=sys.stderr)
    '''],
               stdin=slaves[0], stdout=slaves[0], stderr=slaves[1]):
        for fd in slaves:
            os.close(fd) # no input
        readable = {
            masters[0]: sys.stdout.buffer, # log separately
            masters[1]: sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                try:
                    data = os.read(fd, 1024) # read available
                except OSError as e:
                    if e.errno != errno.EIO:
                        raise #XXX cleanup
                    del readable[fd] # EIO means EOF on some systems
                else:
                    if not data: # EOF
                        del readable[fd]
                    else:
                        readable[fd].write(data)
                        readable[fd].flush()
    for fd in masters:
        os.close(fd)
    

    I don't know what are the side-effects of using different ptys for stdout, stderr. You could try whether a single pty is enough in your case e.g., set stderr=PIPE and use p.stderr.fileno() instead of masters[1]. Comment in sh source suggests that there are issues if stderr not in {STDOUT, pipe}

Solution 2:

If you want to read from stderr and stdout and get the output separately, you can use a Thread with a Queue, not overly tested but something like the following:

import threading
import queue

def run(fd, q):
    for line in iter(fd.readline, ''):
        q.put(line)
    q.put(None)


def create(fd):
    q = queue.Queue()
    t = threading.Thread(target=run, args=(fd, q))
    t.daemon = True
    t.start()
    return q, t


process = Popen(["curl","www.google.com"], stdout=PIPE, stderr=PIPE,
                universal_newlines=True)

std_q, std_out = create(process.stdout)
err_q, err_read = create(process.stderr)

while std_out.is_alive() or err_read.is_alive():
        for line in iter(std_q.get, None):
            print(line)
        for line in iter(err_q.get, None):
            print(line)

Solution 3:

While J.F. Sebastian's answer certainly solves the heart of the problem, i'm running python 2.7 (which wasn't in the original criteria) so im just throwing this out there to any other weary travellers who just want to cut/paste some code. I havent tested this throughly yet, but on all the commands i have tried it seems to work perfectly :) you may want to change .decode('ascii') to .decode('utf-8') - im still testing that bit out.

#!/usr/bin/env python2.7
import errno
import os
import pty
import sys
from select import select
import subprocess
stdout = ''
stderr = ''
command = 'curl google.com ; sleep 5 ; echo "hey"'
masters, slaves = zip(pty.openpty(), pty.openpty())
p = subprocess.Popen(command, stdin=slaves[0], stdout=slaves[0], stderr=slaves[1], shell=True, executable='/bin/bash')
for fd in slaves: os.close(fd)

readable = { masters[0]: sys.stdout, masters[1]: sys.stderr }
try:
    print ' ######### REAL-TIME ######### '
    while readable:
        for fd in select(readable, [], [])[0]:
            try: data = os.read(fd, 1024)
            except OSError as e:
                if e.errno != errno.EIO: raise
                del readable[fd]
            finally:
                if not data: del readable[fd]
                else:
                    if fd == masters[0]: stdout += data.decode('ascii')
                    else: stderr += data.decode('ascii')
                    readable[fd].write(data)
                    readable[fd].flush()
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise
finally:
    p.wait()
    for fd in masters: os.close(fd)
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDERR:'
    print stderr