Why is scp much slower than http?

I have an instance on Amazon EC2 which holds a large-ish file (~180MB). I need to copy that file to my local machine, so naturally I tried scp. After trying multiple times only to get max speeds of 20-30kb/s and dropped connections (only once did I reach ~200KB/s for a short while, but then the connection dropped), I tried HTTP. Over HTTP, I got 1MB/s and it went up to 2MB/s, finished the transfer in under two minutes. Over scp, the ETA was about three hours.

I know scp is slower than HTTP because of encryption, but I don't think that alone could account for a ~30x decrease in performance. So I'm guessing there's some throttling going on, probably at my ISP. Any way I could find out for sure? Or is there some other cause?


Solution 1:

The typical signature of network throttling is a near-constant speed (within 10-20KB/s or so), so if you are being throttled, this is a pattern to look for. Another pattern is "bunching" or "bursting" where you get one or two seconds of high-speed connectivity, followed by a period of low-speed connectivity. If this is the case, your issue is more likely to be buffering/caching at some point.

Typically, your ISP's upstream routing equipment will be configured to QoS HTTP (or more specifically, port 80) traffic with a higher priority than all other traffic, with the (not altogether incorrect) view that most of their customers will be browsing the web, and they don't want someone else's SCP/FTP/Skype/peer-to-peer traffic blocking their pipes.

Amazon themselves don't apply any QoS (that I know of) to their instances. That said, you may be running into CPU-bound issues, especially if you're running a t1.micro (or other small) EC2 instance with a low-powered (or low-priority) CPU resource. Check your CPU steal percentage (run top and check the %st value in the top-right) to see if your CPU is being 'stolen' by other EC2 instances - this is typically the case with low-usage instances - CPU steal allows Amazon to reclaim CPU cycles from dormant/idle instances to meet demand.

Solution 2:

SSHD has some overhead related to security and TCP stuck. that is why it is slower you can use scp-hpn patch, it is faster! You can see more on http://www.psc.edu/index.php/hpn-ssh