Parallel File Copy
I have a list of files I need to copy on a Linux system - each file ranges from 10 to 100GB in size.
I only want to copy to the local filesystem. Is there a way to do this in parallel - with multiple processes each responsible for copying a file - in a simple manner?
I can easily write a multithreaded program to do this, but I'm interested in finding out if there's a low level Linux method for doing this.
Solution 1:
If you system is not thrashed by it (e.g. maybe the files are in cache) then GNU Parallel http://www.gnu.org/software/parallel/ may work for you:
find . -type f -print0 | parallel -0 -j10 cp {} destdir
This will run 10 concurrent cp
s.
Pro: It is simple to read.
Con: GNU Parallel is not standard on most systems - so you probably have to install it.
If you want to keep the directory structure:
find . -type f -print0 |
parallel -0 -j10 mkdir -p destdir/{//}';' cp {} destdir/{//}
Watch the intro video for more info: http://www.youtube.com/watch?v=OpaiGYxkSuQ
See also https://oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster/ for a discussion of parallel disk I/O.
Solution 2:
There is no low-level mechanism for this for a very simple reason: doing this will destroy your system performance. With platter drives each write will contend for placement of the head, leading to massive I/O wait. With SSDs, this will end up saturating one or more of your system buses, causing other problems.