Force output buffer flush in running program

I have a long-running python script that periodically outputs data to standard output that I've invoked with something like:

python script.py > output.txt

This script has been running for a while and I want to stop it with Ctrl+C but not lose any of its output. Unfortunately when I implemented the script I forgot to flush the buffer after each line of output with something like the sys.stdout.flush() (the previously suggested solution for forcing output flushing), so invoking Ctrl+C right now will cause me to lose all my output.

If wondering if there's any way to interact with a running python script (or, more generally, a running process) to force it to flush its output buffer. I'm not asking how to edit and re-run the script to get it to flush correctly -- this question is specifically about interacting with a running process (and, in my case, not losing the output from my current code execution).


Solution 1:

IF one were truly wanting that data, I'd suggest attaching the gdb debugger to the python interpreter, momentarily stopping the task, calling fsync(1) (stdout), detach from it (resuming the process) and go peruse the output file.

Look in /proc/$(pidof python)/fd to see valid file descriptors. $(pidof x) returns the PID of process named 'x'.

# your python script is running merrily over there.... with some PID you've determined.
#
# load gdb
gdb
#
# attach to python interpreter (use the number returned by $(pidof python))
attach 1234
#
# force a sync within the program's world (1 = stdout, which is redirected in your example)
call fsync(1)
#
# the call SHOULD have returned 0x0, sync successful.   If you get 0xffffffff (-1), perhaps that wasn't stdout.  0=stdin, 1=stdout, 2=stderr
#
# remove our claws from poor python
detach
#
# we're done!
quit

I've used this method to change working dir's, tweak settings on the fly... many things. Alas, you can only call functions which are defined in the running program, fsync works nicely though.

(gdb command 'info functions' will list all of the functions available. Be careful though. You're operating LIVE on a process.)

There is also the command peekfd (found in psmisc package on Debian Jessie and others) which will allow you to see what's hiding in buffers of a process. Again, /proc/$(pidof python)/fd will show you valid file descriptors to give as arguments to peekfd.

If you don't remember -u for python, you can always prefix a command with stdbuf (in coreutils, already installed) to set stdin/stdout/stderr to unbuffered, line buffered or block buffered as desired:

stdbuf -i 0 -o 0 -e 0 python myscript.py > unbuffered.output

Of course, man pages are your friends, hey! perhaps an alias might be useful here too.

alias python='python -u'

Now your python always uses -u for all your command line endeavors!

Solution 2:

First make sure you have the debugging symbols for Python (or at least glibc). On Fedora1 you can install them with:

dnf debuginfo-install python

Then attach gdb to the running script and run the following commands:

[user@host ~]$ pidof python2
9219
[user@host ~]$ gdb python2 9219
GNU gdb (GDB) Fedora 7.7.1-13.fc20
...
0x00007fa934278780 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) call fflush(stdout)
$1 = 0
(gdb) call setvbuf(stdout, 0, 2, 0)
$2 = 0
(gdb) quit
A debugging session is active.

    Inferior 1 [process 9219] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/python2, process 9219

This will flush stdout and also disable buffering. The 2 from the setvbuf call is the value of _IONBF on my system. You'll need to find out what's on yours (a grep _IONBF /usr/include/stdio.h should do the trick).

Based on what I've seen in the implementation of PyFile_SetBufSize and PyFile_WriteString in CPython 2.7, it should work pretty well, but I can't make any guarantees.


1 Fedora includes a special type of RPMs called debuginfo rpms. These automatically created RPMs contain the debugging information from the program files, but moved into an external file.

Solution 3:

There is no solution to your immediate problem. If your script has already started, you cannot change the buffering mode after the fact. These are all in-memory buffers and all of that is set up when the script starts, file handles are opened, pipes are created, etc.

As a long-shot, if and only if some or all of the buffering in question is being done at the IO level on output, you could do a sync command; but this is generally unlikely in a case like this.

In the future you can use Python's -u option* to run the script. In general, many commands have command-specific options to disable stdin/stdout buffering, and you may also have some generic success with the unbuffer command from the expect package.

A Ctrl+C would cause system-level buffers to be flushed when the program is interrupted unless the buffering is done by Python itself and it has not implemented the logic to flush its own buffers with Ctrl+C. A suspend, crash, or kill would not be so kind.

*Force stdin, stdout and stderr to be totally unbuffered.