scp stalled while copying large files

According to the needs of the experiment, I set the MTU to 8000. After doing this, when I use scp to copy large files, it stalled with 0.00%. I tried scp -l or scp -C and turning tcp_sack on/off, but it still didn't work. And I can't change the MTU size for experiment result comparison. Is there any other way to help?


An attempt at a comprehensive solution, as there could be several problems and limitations depending on your situation.

rsync

My preferred option: using rsync doesn't give this problem and is a bit more versatile in my opinion, e.g. it keeps track of which files are already there, so if the connection ever does break it can pick up from where it left off - try the --partial flag too - among other things.

Instead of

scp local/path/some_file [email protected]:"/some/path/"

you can just do

rsync -avz --progress local/path/some_file [email protected]:"/some/path/"

I've tested this on several occasions when scp would give me the same problem it gave you - and now I just use rsync by default.

Limit speed

Not a solution for OP as the MTU is fixed in this situation (and probably not the issue here), but if the culprit is a slow/unreliable connection between the two drives, setting a speed limit reduces the delays which make the TCP connection stall - at the expense of a slower transfer of course. This is because scp grabs all the bandwidth it can get unless you specify the maximum data rate in kilobits, like so:

scp -l 8192 local/path/some_file [email protected]:"/some/path/"

This doesn't always work though.

Compression option

scp's -C option can speed up the transfer, reducing the probability that the transfer stalls.

Disabling TCP SACK

As mentioned by the OP, and here.

sudo sysctl -w net.ipv4.tcp_sack=0

(or similar)

LAN card MTU

Again an MTU fix, not necessarily of the transfer specifically though:

ifconfig eth0 mtu 1492

or on newer (Linux) systems:

ip link set dev eth0 mtu 1492

Other

If all else fails, this lists another handful of potential solutions not included here.

The more exotic hpn bug may be at fault too.