Copying huge files between two remote machines - Efficiently

I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems. Key based authentication is used with agent-forwarding and everything works. For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.

"scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"

Now the problem is the process sshd is continuously taking loads of CPU.
For ex: top -c on destination machine (i.e. machine-C) shows

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                         
14580 ssh_user  20   0 99336 3064  772 R 85.8  0.0   0:05.39 sshd: ssh_user@notty                                                            
14581 ssh_user  20   0 55164 1984 1460 S  6.0  0.0   0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/

This results in high load average.

I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.

What other options do I have? I considered 'rsync'. But the rsync man page says:

GENERAL
       Rsync  copies files either to or from a remote host, or locally on the current host (it does not support copying files between two
       remote hosts).

Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.

Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.


Solution 1:

This problem can be solved with rsync. At least this solution should be competitive in terms of performance.

First, rsync can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.

Second, encryption/decryption can be avoided by running rsync in Daemon Access mode instead of Remote Shell Access mode.

In daemon access mode rsync does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.

Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.

Start rsync daemon as a non-privileged user on the destination machine

ssh -i private_key ssh_user@destination-IP \
       "echo -e 'pid file = /tmp/rsyncd.pid\nport = 1873' > /tmp/rsyncd.conf

ssh -i private_key ssh_user@destination-IP \
       rsync --config=/tmp/rsyncd.conf --daemon

Actually copy the files

ssh -i private_key ssh_user@source_ip \
       "rsync [OPTIONS] source-path \
              rsync://ssh_user@destination-IP:1873:destination-path"

Solution 2:

The least-overhad solution would be using netcat:

destination$ nc -l -p 12345 > /path/destinationfile
source$ cat /path/sourcfile | nc desti.nation.ip.address 12345

(some netcat version do not need the "-p" flag for port)

All this does is send the unencrypted data, unauthenticated over the network from one pc to the other. Of course it is not the most "comfortable" way to do it.

Other alternatives would be trying to change the ssh cipher (ssh -c), or using ftp.

PS: rsync works fine with remote machines, but it is mostly used in combination with ssh, so no speedup here.

Solution 3:

If encryption isn't a concern, throw up an NFS daemon on C and mount the directory on B. Use rsync run on B, but specify the local directory paths.

Ignoring whatever your use case for involving A is, just prepend ssh user@B rsync... to the command.

Transfers data without encryption overhead and only transfers the different files.

Also, FTP was built with 3rd party server-to-server transfers as a protocol feature.

Solution 4:

You can use a low crypting method : you can use rsync --rsh="ssh -c arcfour" to increase the speed. I my tests, I am waiting disks and no more the network connection. And use rsync, it is good !