Copying huge files between two remote machines - Efficiently
I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems. Key based authentication is used with agent-forwarding and everything works. For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.
"scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"
Now the problem is the process sshd is continuously taking loads of CPU.
For ex: top -c on destination machine (i.e. machine-C) shows
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14580 ssh_user 20 0 99336 3064 772 R 85.8 0.0 0:05.39 sshd: ssh_user@notty 14581 ssh_user 20 0 55164 1984 1460 S 6.0 0.0 0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/
This results in high load average.
I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.
What other options do I have? I considered 'rsync'. But the rsync man page says:
GENERAL Rsync copies files either to or from a remote host, or locally on the current host (it does not support copying files between two remote hosts).
Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.
Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.
Solution 1:
This problem can be solved with rsync
. At least this solution should be competitive in terms of performance.
First, rsync
can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.
Second, encryption/decryption can be avoided by running rsync
in Daemon Access mode instead of Remote Shell Access mode.
In daemon access mode rsync
does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.
Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.
Start rsync
daemon as a non-privileged user on the destination machine
ssh -i private_key ssh_user@destination-IP \
"echo -e 'pid file = /tmp/rsyncd.pid\nport = 1873' > /tmp/rsyncd.conf
ssh -i private_key ssh_user@destination-IP \
rsync --config=/tmp/rsyncd.conf --daemon
Actually copy the files
ssh -i private_key ssh_user@source_ip \
"rsync [OPTIONS] source-path \
rsync://ssh_user@destination-IP:1873:destination-path"
Solution 2:
The least-overhad solution would be using netcat:
destination$ nc -l -p 12345 > /path/destinationfile
source$ cat /path/sourcfile | nc desti.nation.ip.address 12345
(some netcat version do not need the "-p" flag for port)
All this does is send the unencrypted data, unauthenticated over the network from one pc to the other. Of course it is not the most "comfortable" way to do it.
Other alternatives would be trying to change the ssh cipher (ssh -c), or using ftp.
PS: rsync works fine with remote machines, but it is mostly used in combination with ssh, so no speedup here.
Solution 3:
If encryption isn't a concern, throw up an NFS daemon on C
and mount the directory on B
. Use rsync run on B
, but specify the local directory paths.
Ignoring whatever your use case for involving A
is, just prepend ssh user@B rsync...
to the command.
Transfers data without encryption overhead and only transfers the different files.
Also, FTP was built with 3rd party server-to-server transfers as a protocol feature.
Solution 4:
You can use a low crypting method : you can use rsync --rsh="ssh -c arcfour"
to increase the speed. I my tests, I am waiting disks and no more the network connection. And use rsync, it is good !