Copying files using multi threading increases performance - Why?
Solution 1:
there must be some handshake overhead per file (especially when copying to a network share) that gets reduced when you use multithreaded copy with many little files because you do the handshake simultaneously. I suspect you will see less advantage with big files. This benchmark seems to support that hypothesis: https://www.demartek.com/Reports_Free/RMWTUG_2011-03_Robocopy_multithread_Testing_Dennis_Martin_a.pdf
Examples of a handshake overhead could be checking whether the destination file already exists, checking permissions, ..
Solution 2:
Even on local disk, there is some per-file overhead, which I believe is mostly due to the expense of opening a file: to open an existing file, Windows has to parse the path, find the corresponding entries in each level of the directory tree, look up the file in the MFT, and check the ACL. To create a new file, Windows has to parse the path, find the corresponding entries in each level of the directory tree, check the directory ACL, and add the file to the MFT and the top-level directory entry.
If you only have one thread, you have to open the source file, open the destination file, copy the data, and close the files, and only then can you move on to the next one. That means leaving the I/O subsystem idle part of the time. If you have multiple threads you can be opening files at the same time that you're copying data; ideally, you're keeping the I/O system busy the entire time.
The overhead isn't all that noticeable on a single file, but if you have a lot of small files it adds up and the time saved can be significant.
Solution 3:
When using a single thread you have a pattern of:
1: read - write - read - write ..
so you are using 50% of your time reading from the source disk and 50% writing to the destination disk.
Assuming you have two disks you may use two threads and this pattern:
1: read - write - read - write ..
2: read - write - read - write ...
so interleaving the requests and have both disks at 100% utilization