Why is tar|tar so much faster than cp?

Solution 1:

Cp does open-read-close-open-write-close in a loop over all files. So reading from one place and writing to another occur fully interleaved. Tar|tar does reading and writing in separate processes, and in addition tar uses multiple threads to read (and write) several files 'at once', effectively allowing the disk controller to fetch, buffer and store many blocks of data at once. All in all, tar allows each component to work efficiently, while cp breaks down the problem in disparate, inefficiently small chunks.

Solution 2:

Your edit goes in the good direction: cp isn't necessarily slower than tar | tar. Depends for example on the quantity and size of files. For big files a plain cp is best, since it's a simple job of pushing data around. For lots of small files the logistics are different and tar might do a better job. See for example this answer.