Running multiple scp threads simultaneously
Running multiple scp threads simultaneously:
Background:
I'm often finding myself mirroring a set of server files a lot, and included in these server files are thousands of little 1kb-3kb files. All the servers are connected to 1Gbps ports, generally spread out in a variety of data-centers.
Problem:
SCP transfers these little files, ONE by ONE, and it takes ages, and I feel like I'm wasting the beautiful network resources I have.
Solution?:
I had an idea; Creating a script, which divides the files up into equal amounts, and starts up 5-6 scp threads, which theoretically would then get done 5-6 times faster, no? But I don't have any linux scripting experience!
Question(s):
- Is there a better solution to the mentioned problem?
- Is there something like this that exists already?
- If not, is there someone who would give me a start, or help me out?
- If not to 2, or 3, where would be a good place to start looking to learn linux scripting? Like bash, or other.
I would do it like this:tar -cf - /manyfiles | ssh dest.server 'tar -xf - -C /manyfiles'
Depending on the files you are transferring it can make sense to enable compression in the tar
commands:tar -czf - /manyfiles | ssh dest.server 'tar -xzf - -C /manyfiles'
It may also make sense that you choose a CPU friendlier cipher for the ssh
command (like arcfour):
tar -cf - /manyfiles | ssh -c arcfour dest.server 'tar -xf - -C /manyfiles'
Or combine both of them, but it really depends on what your bottleneck is.
Obviously rsync
will be a lot faster if you are doing incremental syncs.
Use rsync
instead of scp
. You can use rsync
over ssh
as easily as scp
, and it supports "pipelining of file transfers to minimize latency costs".
One tip: If the data is compressible, enable compression. If it's not, disable it.