How does a pipe work in Linux?

If you want to redirect the output of one program into the input of another, just use a simple pipeline:

program1 arg arg | program2 arg arg

If you want to save the output of program1 into a file and pipe it into program2, you can use tee(1):

program1 arg arg | tee output-file | program2 arg arg

All programs in a pipeline are run simultaneously. Most programs typically use blocking I/O: if when they try to read their input and nothing is there, they block: that is, they stop, and the operating system de-schedules them to run until more input becomes available (to avoid eating up the CPU). Similarly, if a program earlier in the pipeline is writing data faster than a later program can read it, eventually the pipe's buffer fills up and the writer blocks: the OS de-schedules it until the pipe's buffer gets emptied by the reader, and then it can continue writing again.

EDIT

If you want to use the output of program1 as the command-line parameters, you can use the backquotes or the $() syntax:

# Runs "program1 arg", and uses the output as the command-line arguments for
# program2
program2 `program1 arg`

# Same as above
program2 $(program1 arg)

The $() syntax should be preferred, since they are clearer, and they can be nested.

Piping does not complete the first command before running the second. Unix (and Linux) piping run all commands concurrently. A command will be suspended if

It is starved for input.
It has produced significantly more output than its successor is ready to consume.

For most programs output is buffered, which means that the OS accumulates a substantial amount of output (perhaps 8000 characters or so) before passing it on to the next stage of the pipeline. This buffering is used to avoid too much switching back and forth between processes and kernel.

If you want output on a pipeline to be sent right away, you can use unbuffered I/O, which in C means calling something like fflush() to be sure that any buffered output is immediately sent on to the next process. Unbuffered input is also possible but is generally unnecessary because a process that is starved for input typically does not wait for a full buffer but will process any input you can get.

For typical applications unbuffered output is not recommended; you generally get the best performance with the defaults. In your case, however, where you want to do dynamic graphing immediately the first process has the info available, you definitely want to be using unbuffered output. If you're using C, calling fflush(stdout) whenever you want output sent will be sufficient.

If your programs are communicating using stdin and stdout, then make sure that you are either calling fflush(stdout) after you write or find some way to disable standard IO buffering. The best reference that I can think of that really describe how to best implement pipelines in C/C++ is Advanced Programming in the UNIX Environment or UNIX Network Programming: Volume 2. You could probably start with a this article as well.

If your two programs insist on reading and writing to files and do not use stdin/stdout, you may find you can use a named pipe instead of a file.

Create a named pipe with the mknod(1) command:

$ mknod /tmp/named-pipe p

Then configure your programs to read and write to /tmp/named-pipe (use whatever path/name you feel is appropriate).

In this case, both programs will run in parallel, blocking as necessary when the pipe becomes full/empty as described in the other answers.

How does a pipe work in Linux?

Related

Recent Posts