Rsync huge dataset of small files 5TB, +M small files
Try xargs+rsync:
find . -type f -print0 | xargs -J % -0 rsync -aP % user@host:some/dir/
You can control how many files to pass as source to each call of rsync with -n
E.g. to copy 200 files at every rsync:
find . -type f -print0 | xargs -n 200 -J % -0 rsync -aP % user@host:some/dir/
If it's too slow you can run multiple copies of rsync in parallel with the -P
option:
find . -type f -print0 | xargs -P 8 -n 200 -J % -0 rsync -aP % user@host:some/dir/
This will start 8 copies of rsync in parallel.
If this is a trusted/secure network, and you can open a port on the target host, a good way to reproduce a tree on another machine is the combination of tar and netcat. I'm not at a terminal so cant write a full demonstration but this page does a pretty good job:
http://toast.djw.org.uk/tarpipe.html
Definitely use compression. In the best case you can transfer the data at the throughput rate the slowest of the three potential bottlenecks- read on the source, network, write on the target- permits.