Parallel file copy from single source to multiple targets?
I have a several large files on optical media I would like to copy to multiple targets - in this case I have two hard drives attached to the same computer. Is there a utility that can function like:
copy source target1 target2 ... targetN
For single files you can use tee
to copy to multiple places:
cat <inputfile> | tee <outfile1> <outfile2> > <outfile3>
or if you prefer the demoggified version:
tee <outfile1> <outfile2> > <outfile3> < <inputfile>
Note that as Dennis points out in the comments tee
outputs to stdout
as well as the listed files, hence using redirect to point to file 3 in the above examples. You could also redirect this to /dev/null
as below - this has the advantage of keeping the file list more consistent on the command line (which may make it easier to script up a solution for variable numbers of files) but is a little less efficient (though the efficiency difference is small: about the same as the difference between using the cat
version or the version without cat
):
cat <inputfile> | tee <outfile1> <outfile2> <outfile3> > /dev/null
You could probably combine one of the above with find
quite easily to operate on multiple files in one directory and less easily to operate on files spread over a directory structure. Otherwise you might just have to set the multiple copy operations off in parallel as separate tasks and hope that the OS disk cache is bright and/or big enough that each of the parallel tasks used cached read data from the first instead of causing drive-head thrashing.
AVAILABILITY: tee
is commonly available on standard Linux setups and other unix or unix-alike systems, usually as part of the GNU "coreutils" package. If you are using Windows (your question doesn't specify) then you should find it in the various Windows ports such as Cygwin.
PROGRESS INFORMATION: As copying a large file off optical media may take some time (or over slow network, or an even larger file from even local fast media), progress information can be useful. On the command line I tend to use pipe viewer (available in most Linux distros & many Windows port collections and easy to compile yourself where not available directly) for this - just replace cat
with pv
like so:
pv <inputfile> | tee <outfile1> <outfile2> > <outfile3>
For Windows:
n2ncopy will do this:
For Linux:
The cp
command alone can copy from multiple sources but unfortunately not multiple destinations. You will need to run it multiple times in a loop of some sort. You can use a loop like so and place all directory names in a file:
OLDIFS=$IFS
IFS=$'\n'
for line in $(cat file.txt):
do
cp file $line
done
IFS=$OLDIFS
or use xargs:
echo dir1 dir2 dir3 | xargs -n 1 cp file1
Both of these will allow you to copy entire directories/multiple files. This is also discussed in this StackOverflow article.
Based off of the answer given for a similar question Another way is to use GNU Parallel to run multiple cp
instances at once:
parallel -j 0 -N 1 cp file1 ::: Destination1 Destination2 Destination3
The above command will copy file1 to all three destination folders in parallel
Ryan Thompson's solution:
for x in dest1 dest2 dest3; do cp srcfile $x &>/dev/null &; done; wait;
makes a lot of sense: If write speed of the destination dirs is approximately the same then srcfile will only be read once from disk. The rest of the time it will be read from cache.
I would make it a bit more general, so you also get subdirs:
for x in dest1 dest2 dest3; do cp -a srcdir $x &; done; wait;
If the write speed of the dest dirs are very different (e.g. one is on a ram disk and the other on NFS), then you may see that the parts of srcdir read while copying srcdir to dest1 is no longer in the disk cache when writing dest2.
In bash (Linux, Mac or Cygwin):
cat source | tee target1 target2 >targetN
(tee copies it's input to STDOUT, so use redirection on the last target).
In Windows, Cygwin is often overkill. Instead, you can just add the exes from the UnxUtils project, which include cat, tee, and many others.