OS X / Linux: pipe into two processes?
I know about
program1 | program2
and
program1 | tee outputfile | program2
but is there a way to feed program1's output into both program2 and program3?
Solution 1:
You can do this with tee
and process substitution.
program1 | tee >(program2) >(program3)
The output of program1
will be piped to whatever is inside ( )
, in this case program2
and program3
.
Solution 2:
Intro about parallelisation
This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.
You may have to take care about some particular effects, like order of execution, exection time, etc.
There are some sample at end of this post.
Compatible answer first
As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)
Yes, there is a way to use unnamed pipes.
In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...
For this to I will first run the preparation:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
SED_CMD=`which sed`
Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).
The syntax NN>&1
(where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN
. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).
Try this (tested under dash, busybox and bash) :
(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
or more readable:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
(
(
(
(
seq 1 100000 |
shuf |
tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |
$GZIP_CMD >/tmp/tst.gz
) 4>&1 |
$BZIP2_CMD >/tmp/tst.bz2
) 5>&1 |
$LZMA_CMD >/tmp/tst.lzma
) 6>&1 |
$XZ_CMD >/tmp/tst.xz
) 7>&1 |
$MD5SUM_CMD
2e67f6ad33745dc5134767f0954cbdd6 -
As shuf
do random placement, if you try this, you must obtain different result,
ls -ltrS /tmp/tst.*
-rw-r--r-- 1 user user 230516 oct 1 22:14 /tmp/tst.bz2
-rw-r--r-- 1 user user 254811 oct 1 22:14 /tmp/tst.lzma
-rw-r--r-- 1 user user 254892 oct 1 22:14 /tmp/tst.xz
-rw-r--r-- 1 user user 275003 oct 1 22:14 /tmp/tst.gz
but you must be able to compare md5 checksums:
SED_CMD=`which sed`
for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do
${chk#*:} -d < /tmp/tst.${chk%:*} |
$MD5SUM_CMD |
$SED_CMD s/-$/tst.${chk%:*}/
done
2e67f6ad33745dc5134767f0954cbdd6 tst.gz
2e67f6ad33745dc5134767f0954cbdd6 tst.bz2
2e67f6ad33745dc5134767f0954cbdd6 tst.lzma
2e67f6ad33745dc5134767f0954cbdd6 tst.xz
Using bash features
Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}
, instead of tee /dev/fd/4 /dev/fd/5 /...
(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |
bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |
xz >/tmp/tst.xz ) 7>&1 | md5sum
29078875555e113b31bd1ae876937d4b -
will work same.
Final check
This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):
(
(
(
(
(
seq 1 100000 |
tee /dev/fd/{4,5,6,7} |
gzip |
wc -c |
sed s/^/gzip:\ \ / >&3
) 4>&1 |
bzip2 |
wc -c |
xargs printf "bzip2: %s\n" >&3
) 5>&1 |
lzma |
wc -c |
perl -pe 's/^/lzma: /' >&3
) 6>&1 |
xz |
wc -c |
awk '{printf "xz: %9s\n",$1}' >&3
) 7>&1 |
wc -c
) 3>&1
gzip: 215157
bzip2: 124009
lzma: 17948
xz: 17992
588895
This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.
Syntax >(...)
and <(...)
Recent bash versions permit a new syntax feature.
seq 1 100000 | wc -l
100000
seq 1 100000 > >( wc -l )
100000
wc -l < <( seq 1 100000 )
100000
As |
is an unnamed pipe to /dev/fd/0
, the syntax <()
do generate temporary unnamed pipe with others file descriptor /dev/fd/XX
.
md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(
lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz)
29078875555e113b31bd1ae876937d4b /dev/fd/63
29078875555e113b31bd1ae876937d4b /dev/fd/62
29078875555e113b31bd1ae876937d4b /dev/fd/61
29078875555e113b31bd1ae876937d4b /dev/fd/60
More sophisticated demo
This require GNU file
utility to be installed. Will determine command to be run by extension or file type.
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $md5 \ $file
done
29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
29078875555e113b31bd1ae876937d4b /tmp/tst.gz
29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
29078875555e113b31bd1ae876937d4b /tmp/tst.xz
This let you do same previous thing by following syntax:
seq 1 100000 |
shuf |
tee >(
echo gzip. $( gzip | wc -c )
) >(
echo gzip, $( wc -c < <(gzip))
) >(
gzip | wc -c | sed s/^/gzip:\ \ /
) >(
bzip2 | wc -c | xargs printf "bzip2: %s\n"
) >(
lzma | wc -c | perl -pe 's/^/lzma: /'
) >(
xz | wc -c | awk '{printf "xz: %9s\n",$1}'
) > >(
echo raw: $(wc -c)
) |
xargs printf "%-8s %9d\n"
raw: 588895
xz: 254556
lzma: 254472
bzip2: 231111
gzip: 274867
gzip, 274867
gzip. 274867
Note I used different way used to compute gzip
compressed count.
Note Because this operation was done simultaneously, output order will depend on time required by each command.
Going further about parallelisation
If you run some multi-core or multi-processor computer, try to compare this:
i=1
time for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $((i++)) $md5 \ $file
done |
cat -n
wich may render:
1 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
2 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
3 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
4 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
real 0m0.101s
with this:
time (
i=1 pids=()
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
(
read -a md5 < <($cmd -d <$file|md5sum)
echo $i $md5 \ $file
) & pids+=($!)
((i++))
done
wait ${pids[@]}
) |
cat -n
could give:
1 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
2 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
3 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
4 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
real 0m0.070s
where ordering depend on type used by each fork.