Why use a named pipe instead of a file?
Almost everything in Linux can be considered a file, but the main difference between a regular file and a named pipe is that a named pipe is a special instance of a file that has no contents on the filesystem.
Here is quote from man fifo
:
A FIFO special file (a named pipe) is similar to a pipe, except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing. When processes are exchanging data via the FIFO, the kernel passes all data internally without writing it to the filesystem. Thus, the FIFO special file has no contents on the filesystem; the filesystem entry merely serves as a reference point so that processes can access the pipe using a name in the filesystem.
The kernel maintains exactly one pipe object for each FIFO special file that is opened by at least one process. The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.
So actually a named pipe does nothing until some process reads and writes to it. It does not take any space on the hard disk (except a little bit of meta information), it does not use the CPU.
You can check it by doing this:
Create a named pipe
$ mkfifo /tmp/testpipe
Go to some directory, for example /home/user/Documents
, and gzip everything inside it, using named pipe.
$ cd /home/user/Documents
$ tar cvf - . | gzip > /tmp/testpipe &
[1] 28584
Here you should see the PID of the gzip process. In our example it was 28584.
Now check what this PID is doing
$ ps u -P 28584
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
c0rp 28584 0.0 0.0 29276 7800 pts/8 S 00:08 0:00 bash
You will see that it is using no resources. 0% CPU usage, 0% memory usage.
Verify hunch regarding file space usage
$ du -h /tmp/testpipe
0 testpipe
And again 0
, nothing. The testpipe could be used again if needed.
Don't forget to kill gzip, using kill -15 28584
. And remove our named pipe using rm /tmp/testpipe
Example Usages
You can redirect almost everything using named pipe. As example you can see this one line proxy.
Also here is one more nice explanation of named pipe usage. You can configure two processes on one server to communicate using a named pipe instead of TCP/IP stack. It is much faster, and does not load network resources. For example your Web Server can communicate with the database directly using a named pipe, instead of using localhost
address or listening to some port.
It is true that you won't use system memory but the fact you don't use cpu in your example is only because you don't read the pipe so the process is waiting.
Consider following example:
mkfifo /tmp/testpipe
tar cvf - / | gzip > /tmp/testpipe
Now open a new console and run:
watch -n 1 'ps u -P $(pidof tar)
And in a third console:
cat /tmp/testpipe > /dev/null
If you look at the watch cmd (2nd term) it will show an increase in cpu consumption !
Here is a use case where named pipes can save you a lot of time by removing I/O.
Let's suppose you have a BigFile, for example 10G.
You also have splits of this BigFile in pieces of 1G, BigFileSplit_01 to BigFile_Split_10.
Now you have a doubt on the correctness of BigFileSplit_05
Naively, without named pipes, you would create a new split from BigFile and compare:
dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05
With named pipes you would do
mkfifo BigFileSplitOrig_05
dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1 &
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05
That may not seem at first sight a big difference... but in time the difference is huge!
Option 1:
- dd: read 1G / write 1G (1)
- diff: read 2G
- rm: free allocated clusters / remove directory entry
Option 2:
- dd: nothing! (goes to named pipe)
- diff: read 2G
- rm: no allocated cluster to manage (we didn't actually write anything to the filesystem) / remove directory entry
So basically the named pipe saves you here a read and write of 1G plus some filesystem cleaning (since we wrote nothing to the filesystem but the empty fifo node).
Not doing I/O, especially writes, is also good to avoid the wear of your disks. It is even more interesting when you work with SSDs since they have a limited number of writes before cells die.
(1) Obviously, another option would be to create that temporary file to RAM, for example if /tmp is mounted to RAM (tmpfs). Nevertheless you would be limited by the size of the RAM disk, whereas the "named pipe trick" has no limits.
You can let a program lie still and listen to a named pipe for some outside event. As soon as the outside event occurs (f.ex. arrival of some new data) this could be detected by some other program which in turn opens the pipe for write, writing the relevant event data to the pipe. When the close statement is issued, the listening program will receive the stream of data through the pipe via a read statement, and is ready to process what it has got. Don't forget tor close the pipe after reading the content. The listening program could also return results of its processing via the same, or via another named pipe. Such inter-program communications is very convenient at times.