Running multiple scp threads simultaneously

Running multiple scp threads simultaneously:

Background:

I'm often finding myself mirroring a set of server files a lot, and included in these server files are thousands of little 1kb-3kb files. All the servers are connected to 1Gbps ports, generally spread out in a variety of data-centers.

Problem:

SCP transfers these little files, ONE by ONE, and it takes ages, and I feel like I'm wasting the beautiful network resources I have.

Solution?:

I had an idea; Creating a script, which divides the files up into equal amounts, and starts up 5-6 scp threads, which theoretically would then get done 5-6 times faster, no? But I don't have any linux scripting experience!

Question(s):

  • Is there a better solution to the mentioned problem?
  • Is there something like this that exists already?
  • If not, is there someone who would give me a start, or help me out?
  • If not to 2, or 3, where would be a good place to start looking to learn linux scripting? Like bash, or other.

I would do it like this:
tar -cf - /manyfiles | ssh dest.server 'tar -xf - -C /manyfiles'

Depending on the files you are transferring it can make sense to enable compression in the tar commands:
tar -czf - /manyfiles | ssh dest.server 'tar -xzf - -C /manyfiles'

It may also make sense that you choose a CPU friendlier cipher for the ssh command (like arcfour): tar -cf - /manyfiles | ssh -c arcfour dest.server 'tar -xf - -C /manyfiles'

Or combine both of them, but it really depends on what your bottleneck is.
Obviously rsync will be a lot faster if you are doing incremental syncs.


Use rsync instead of scp. You can use rsync over ssh as easily as scp, and it supports "pipelining of file transfers to minimize latency costs".

One tip: If the data is compressible, enable compression. If it's not, disable it.