SSH file transfers hang after ~700KB

OVERVIEW

I'm scripting SFTP file transfers on a Windows 7 box in the field. When I transfer zip files larger than a megabyte or so, the transfer (and SSH) stops unexpectedly after about 700KB. The SSH process seems to hang but it doesn't start wildly consuming CPU or memory and Windows doesn't say it's stopped responding. The server eventually gives up and times out, but the client never returns (despite setting ServerAliveCountMax and ServerAliveInterval). The server logs show the transfer start followed by the timeout and disconnect after a long time.

DETAILS

Re-running the script back to back to back, the same number of bytes are transferred before the hang each time. Walking away and coming back, then running the script again, the byte count will vary by a few tens of kilobytes.

The command-line sftp.exe client works properly with default options; those larger zip files transfer successfully. I can't test scp.exe because the SSH user account doesn't have shell access. I believe the problem persisted across an OS wipe and re-image from XP to 7 so I suspect a network problem, but I can't fathom what kind of problem would cause this behavior.

I'm scripting with Perl and the Net::SFTP::Foreign module. Here are the relevant bits of code extracted from my test case script (SSH output log attached below):

 my %connectOpts = ( 
  user     => $username
  ,timeout => $timeout
  ,warn    => sub {}
  ,more    => [ "-o PreferredAuthentications=publickey",
               "-o ServerAliveCountMax=6",
               "-o ServerAliveInterval=" . int($timeout/6),
               "-vvvv" ]
  );

my $sftp = Net::SFTP::Foreign::Compat->new( $host, %connectOpts );

foreach my $local_file ( @local_files ) {
  $sftp->do_stat( 'upload.tmp' ) && print( "Overwriting upload.tmp\n" );

  $sftp->put( $local_file, 'upload.tmp' );

  $sftp->do_rename( 'upload.tmp', basename( $local_file ));
}

I use this code and very similar configuration on dozens of other field PCs for the same purpose. This is the first one I've seen with this problem. I'm looking for a root cause or diagnostic advice, not a workaround outside this ecosystem. There are countless ways to script this, but I need the granularity, feedback, and modularity of the Net::SFTP::Foreign module for logging and error reporting. Perl is the only scripting language available.

Thanks. This has been incredibly frustrating.

Configuration:

  • Windows 7
  • ActivePerl 5.8.9
  • Net::SFTP::Foreign v1.69
  • OpenSSH 5.3p1 (Windows client from CopSSH distro)
  • OpenVPN 2.2-rc (Windows connection back to corporate network)
  • Tectia Server 6.2.1.168 (Linux SSH server)

Client output (-vvvv) until I manually terminated ssh.exe after it went sideways:

OpenSSH_5.3p1, OpenSSL 0.9.8k 25 Mar 2009
debug1: Reading configuration data /etc/ssh_config
debug2: ssh_connect: needpriv 0
debug1: Connecting to sftp.server.com [xx.xx.xx.xx] port 22.
debug1: Connection established.
debug1: identity file /cygdrive/c/program files/copssh/.ssh/identity type -1
debug3: Not a RSA1 key file /cygdrive/c/program files/copssh/.ssh/id_rsa.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /cygdrive/c/program files/copssh/.ssh/id_rsa type 1
debug1: identity file /cygdrive/c/program files/copssh/.ssh/id_dsa type -1
debug1: Remote protocol version 2.0, remote software version 6.2.1.168 SSH Tectia Server
debug1: no match: 6.2.1.168 SSH Tectia Server
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug3: Wrote 792 bytes for a total of 813
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-g
roup-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour12
8,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rij
[email protected]
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour12
8,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rij
[email protected]
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160
,[email protected],hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160
,[email protected],hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,[email protected],zlib
debug2: kex_parse_kexinit: none,[email protected],zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: diffie-hellman-group1-sha1,diffie-hellman-group14-sha
1,[email protected],diffie-hellman-group-exchange-sha1,diffi
e-hellman-group-exchange-sha256
debug2: kex_parse_kexinit: ssh-rsa,[email protected]
debug2: kex_parse_kexinit: aes256-cbc,aes192-cbc,aes128-cbc,3des-cbc
debug2: kex_parse_kexinit: aes256-cbc,aes192-cbc,aes128-cbc,3des-cbc
debug2: kex_parse_kexinit: hmac-sha1
debug2: kex_parse_kexinit: hmac-sha1
debug2: kex_parse_kexinit: none,zlib
debug2: kex_parse_kexinit: none,zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: found hmac-sha1
debug1: kex: server->client aes128-cbc hmac-sha1 none
debug2: mac_setup: found hmac-sha1
debug1: kex: client->server aes128-cbc hmac-sha1 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<2048<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug3: Wrote 24 bytes for a total of 837
debug2: dh_gen_key: priv key bits set: 160/320
debug2: bits set: 1016/2048
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug3: Wrote 272 bytes for a total of 1109
debug3: check_host_in_hostfile: filename /cygdrive/c/program files/copssh/.ssh/known_hosts
debug3: check_host_in_hostfile: match line 2
debug3: check_host_in_hostfile: filename /cygdrive/c/program files/copssh/.ssh/known_hosts
debug3: check_host_in_hostfile: match line 1
debug1: Host 'sftp.server.com' is known and matches the RSA host key.
debug1: Found key in /cygdrive/c/program files/copssh/.ssh/known_hosts:2
debug2: bits set: 995/2048
debug1: ssh_rsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: Wrote 16 bytes for a total of 1125
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug3: Wrote 52 bytes for a total of 1177
debug3: Received SSH2_MSG_IGNORE
debug3: Received SSH2_MSG_IGNORE
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /cygdrive/c/program files/copssh/.ssh/identity (0x0)
debug2: key: /cygdrive/c/program files/copssh/.ssh/id_rsa (0xd4b720)
debug2: key: /cygdrive/c/program files/copssh/.ssh/id_dsa (0x0)
debug3: Wrote 84 bytes for a total of 1261
debug3: Received SSH2_MSG_IGNORE
debug1: Authentications that can continue: publickey,keyboard-interactive
debug3: start over, passed a different list publickey,keyboard-interactive
debug3: preferred publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred:
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Trying private key: /cygdrive/c/program files/copssh/.ssh/identity
debug3: no such identity: /cygdrive/c/program files/copssh/.ssh/identity
debug1: Offering public key: /cygdrive/c/program files/copssh/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug3: Wrote 372 bytes for a total of 1633
debug3: Received SSH2_MSG_IGNORE
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug2: input_userauth_pk_ok: fp (...)
debug3: sign_and_send_pubkey
debug1: read PEM private key done: type RSA
debug3: Wrote 644 bytes for a total of 2277
debug3: Received SSH2_MSG_IGNORE
debug1: Authentication succeeded (publickey).
debug2: fd 4 setting O_NONBLOCK
debug2: fd 5 setting O_NONBLOCK
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug1: Entering interactive session.
debug3: Wrote 68 bytes for a total of 2345
debug3: Received SSH2_MSG_IGNORE
debug2: callback start
debug2: client_session2_setup: id 0
debug1: Sending subsystem: sftp
debug2: channel 0: request subsystem confirm 1
debug2: fd 3 setting TCP_NODELAY
debug2: callback done
debug2: channel 0: open confirm rwindow 65536 rmax 32768
debug3: Wrote 68 bytes for a total of 2413
debug3: Wrote 52 bytes for a total of 2465
debug3: Received SSH2_MSG_IGNORE
debug2: channel_input_status_confirm: type 99 id 0
debug2: subsystem request accepted on channel 0
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 68 bytes for a total of 2533
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 84 bytes for a total of 2617
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 68 bytes for a total of 2685
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 16436 bytes for a total of 19121
debug3: Wrote 4148 bytes for a total of 23269
debug3: Wrote 12372 bytes for a total of 35641
debug3: Wrote 16436 bytes for a total of 52077
debug3: Wrote 4148 bytes for a total of 56225
debug3: Wrote 12196 bytes for a total of 68421
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 32899
debug3: Wrote 212 bytes for a total of 68633
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 16436 bytes for a total of 85069
debug3: Wrote 4148 bytes for a total of 89217
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 98337
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 101589
debug3: Wrote 16436 bytes for a total of 118025
debug3: Wrote 4148 bytes for a total of 122173
debug3: Wrote 12372 bytes for a total of 134545
debug3: Wrote 16436 bytes for a total of 150981
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 155129
debug3: Wrote 12372 bytes for a total of 167501
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 196674
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 16436 bytes for a total of 183937
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 188085
debug3: Wrote 12372 bytes for a total of 200457
debug3: Wrote 16436 bytes for a total of 216893
debug3: Wrote 4148 bytes for a total of 221041
debug3: Wrote 12372 bytes for a total of 233413
debug3: Wrote 16436 bytes for a total of 249849
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 253997
debug3: Wrote 12372 bytes for a total of 266369
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 16436 bytes for a total of 282805
debug3: Wrote 4148 bytes for a total of 286953
debug3: Wrote 12372 bytes for a total of 299325
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 131204
debug3: Wrote 16436 bytes for a total of 315761
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 319909
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 332281
debug3: Wrote 16436 bytes for a total of 348717
debug3: Wrote 4148 bytes for a total of 352865
debug3: Wrote 12372 bytes for a total of 365237
debug3: Wrote 16436 bytes for a total of 381673
debug3: Wrote 4148 bytes for a total of 385821
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 398193
debug3: Wrote 16436 bytes for a total of 414629
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 418777
debug3: Wrote 12372 bytes for a total of 431149
debug3: Wrote 16436 bytes for a total of 447585
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 393348
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 451733
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 464105
debug3: Wrote 16436 bytes for a total of 480541
debug3: Wrote 4148 bytes for a total of 484689
debug3: Wrote 12372 bytes for a total of 497061
debug3: Wrote 16436 bytes for a total of 513497
debug3: Wrote 4148 bytes for a total of 517645
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 530017
debug3: Wrote 16436 bytes for a total of 546453
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 550601
debug3: Wrote 12372 bytes for a total of 562973
debug3: Wrote 16436 bytes for a total of 579409
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 583557
debug3: Wrote 12372 bytes for a total of 595929
debug3: Wrote 16436 bytes for a total of 612365
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 616513
debug3: Wrote 12372 bytes for a total of 628885
debug3: Wrote 16436 bytes for a total of 645321
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 649469
debug3: Wrote 12372 bytes for a total of 661841
debug3: Wrote 16436 bytes for a total of 678277
debug3: Wrote 4148 bytes for a total of 682425
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 12372 bytes for a total of 694797
debug3: Wrote 16436 bytes for a total of 711233
debug3: Received SSH2_MSG_IGNORE
debug2: channel 0: rcvd adjust 738120
debug3: Received SSH2_MSG_IGNORE
debug3: Wrote 4148 bytes for a total of 715381
debug3: Received SSH2_MSG_IGNORE

UPDATE 1
Per @Daniel Lawson's advice, I tried using netsh to clamp the VPN interface's MTU to 1280 to avoid filling the client's TCP buffers in case of an MTU mismatch along the path to the server. The transfer still stalled at about the same point. I also tried disconnecting the VPN and clamping the physical interface's MTU to 1280 with the same failed result.


This feels like an MTU issue to me. I've seen similar problems with scping files when there is a path MTU mismatch. TCP buffers on the host can fill up, which is why you send more than a few packets, but once they fill up your client then stalls - long after the MTU issue itself actually had an impact.

To verify this you could look at a tool like scamper. Or you could try manually clamping your MTU to some lower point - say 1280 bytes - and seeing if that fixes it. The chances are good that there's only a slight mismatch, eg because of a PPPoE link in the path that reduces the MTU to 1492.


perhaps you can play with the block_size parameter? I've experienced some weirdness related to file uploads over SSH/SSL and somehow adjusting the block / chunk size helped. I believe the default is 32 * 1024, so perhaps lowering it to 16, 8, 4 and trying might make some difference. Might be worth a shot.


The default queue size on Windows was set to 4, but this default was not being picked up due to a bug on the module.

It has been corrected (and BTW, the Windows default queue size changed to 16). You can now download Net::SFTP::Foreign 1.70_08.

Next time you find a problem like this, please, report it!


NOTE (TLDR): As salva said, the quick fix is to update to Net::SFTP::Foreign 1.70_08 or newer. There was a bug preventing the library from using the right default queue_size in Windows. Read on for my original solution.

I was able to eliminate stalled transfers by reducing the Perl Net::SFTP::Foreign module's queue_size parameter from the default 32 to 23. This is a network performance-related variable and I don't perceive a speed difference when it's reduced. For production, I'm going to dial it back to 16 for safety.

queue_size => $size

read and write requests are pipelined in order to maximize transfer throughput. This option allows to set the maximum number of requests that can be concurrently waiting for a server response.

(from Net::SFTP::Foreign documentation on CPAN)

The queue_size option applies to Net::SFTP::Foreign's new, get, put, rget, and rput methods.

If you're using Net::SFTP::Foreign, you can tack on the queue_size parameter at the end of your constructor or the individual method calls.

my $sftp = Net::SFTP::Foreign->new( $host, queue_size => 16 );
# -- OR --
$sftp->put( $local_file, $remote_file, queue_size => 16 );

I'm using the Net::SFTP::Foreign::Compat wrapper for a Net::SFTP-like API. The wrapper lets you set default parameters to use with the underlying Net::SFTP::Foreign method calls.

use Net::SFTP::Foreign::Compat;

# Note that the Compat CPAN documentation is INCORRECT
# The DEFAULTS hash is under Compat, not Foreign

$Net::SFTP::Foreign::Compat::DEFAULTS{ new } = [ queue_size => 16 ];
# -- OR --
$Net::SFTP::Foreign::Compat::DEFAULTS{ put } = [ queue_size => 16 ];

my $sftp = Net::SFTP::Foreign::Compat->new( $host, %options );

Related thought: The maximum working queue_size of 23 and the default block_size of 32KB correlated with the number of bytes sent before the transfer stalled. 23 * 32KB is a little over 700KB. I can't explain why I needed to limit the queue_size on this deployment, but it solved my problem.