What is fastest way to copy a sparse file? What method results in the smallest file?

BACKGROUND: I'm copying a sparse qcow2 VM image that is 200GB in size, but has 16GB of allocated blocks. I've tried various methods to copy this sparse file within the same server and have some preliminary results. Environment is RHEL 6.6 or CentOS 6.6 x64.

ls -lhs srcFile 
16G -rw-r--r-- 1 qemu qemu 201G Feb  4 11:50 srcFile

Via cp - best speed

cp --sparse=always srcFile dstFile
Performance Notes:
    Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
    Copy time: 1:02 (mm:ss) 

Via dd - best overall performer

dd if=srcFile of=dstFile iflag=direct oflag=direct bs=4M conv=sparse
Performance Notes:
    Copied 200GB max/16GB actual VM as 200GB max/21GB actual, bloat: 5GB
    Copy time: 2:02 (mm:ss)

Via cpio

mkdir tmp$$
echo srcFile | cpio -p --sparse tmp$$; mv tmp$$/srcFile dstFile
rmdir tmp$$
Performance Notes:
    Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
    Copy time: 9:26 (mm:ss)

Via rsync

rsync --ignore-existing -aS srcFile dstFile
Performance Notes:
    Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
    Copy time: 24:49 (mm:ss)

Via virt-sparsify - best size

virt-sparsify srcFile dstFile
    Copied 200GB max/16GB actual VM as 200GB max/16GB actual, bloat: 0
    Copy time: 17:37 (mm:ss)

Varying Blocksize

I was concerned about the 'bloat' during dd copying (file size increase from the original), so I varied the blocksize. I used 'time' to also get the total time and CPU%. The original file in this case is a 7.3GB sparse 200GB file:

4K:   5:54.64, 56%, 7.3GB
8K:   3:43.25, 58%, 7.3GB
16K:  2:23.20, 59%, 7.3GB
32K:  1:49.25, 62%, 7.3GB
64K:  1:33.62, 64%, 7.3GB
128K: 1:40.83, 55%, 7.4GB
256K: 1:22.73, 64%, 7.5GB
512K: 1:44.84, 74%, 7.6GB
1M:   1:16.59, 70%, 7.9GB
2M:   1:21.58, 66%, 8.4GB
4M:   1:17.52, 69%, 9.5GB
8M:   1:10.92, 76%, 12GB
16M:  1:17.09, 78%, 16GB
32M:  2:54.10, 90%, 22GB

QUESTION: Can you verify that I've identified the best methods for copying a sparse file to get best overall performance? Any suggestions on how to do this better are welcomed as are any concerns you might have with the methods I'm using.


Solution 1:

From the above benchmarking, it looks like using dd on our target hardware with a blocksize of 64K gives us the best overall result considering the copy time and bloat:

dd if=srcFile of=dstFile iflag=direct oflag=direct bs=64K conv=sparse