Transfer large amount of small files

If you really need a quick way to transfer files, and both systems are Linux-based, you can try UDR.

This is really a form of rsync-over-UDP (using the open-source UDT framework) and is particularly handy for moving large numbers of files or transferring over high-bandwidth or high-latency links. In addition, encryption is disabled by default, so the RAM/CPU hit is lower than traditional rsync. SSH is not involved either.

I can easily get wire-speed transfers over 1Gbps with 10-million small TIFF files in a directory tree.

Your syntax will be slightly modified from rsync. All rsync flags need to appear before the source/destination specification:

udr rsync -avP --stats --delete /data/ server2:/data/

Easy to build... You'll need g++ and openssl-devel:

git clone https://github.com/LabAdvComp/UDR.git
cd UDR/
make
cp src/udr /usr/local/bin/

Do that on the source and destination.


See: Possibility of WAN Optimization for SSH traffic


If used in daemon mode without encryption, rsync can efficiently transfer large amount of small files. Give it another try using it in daemon mode.


Have you not thought of exposing the SAN LUNs directly to the new VMs - this generally works just fine and can be faster than copying the files into a VMDK - though it can 'lock' the VMs onto their initial host. But you could use this to get things going then migrate the files into a VMDK at your own pace - with rsync - and later cut the link to the original LUNs.


If the destination VMs aren't yet built, you might try using the free VMware Converter to copy the data over.

In fact, even if they are built, you could clone the disks to a dummy VM then attach them to existing VM once the clone is done.

In any event, the converter uses two methods to clone files from source to destination, the full details of which can be found here.

If the destination disks are configured to be smaller than the source, it will clone individual files into the new VM.

However, if the destination disks are setup to be equal or larger, it clones blocks. This would make the amount of files on disk pretty much irrelevant and it should run relatively quickly.

I doubt you'll fill a 1Gbps pipe, but you should get more than 50Mbps.

Just remember that you're still looking to move 5TB, so it's going to take some time.