SSH remote port forwarding failed
Follow-Up: It looks like the rapid series of disconnects coinciding with a few months of running each server is probably coincidental and just served to reveal the actual problem. The reason it failed to reconnect is almost certainly due to the AliveInterval values (kasperd's answer). Using the ExitOnForwardFailure option should allow the timeout to occur properly before reconnecting, which should solve the problem in most cases. MadHatter's suggestion (the kill script) is probably the best way to make sure that the tunnel can reconnect even if everything else fails.
I have a server (A) behind a firewall that initiates a reverse tunnel on several ports to a small DigitalOcean VPS (B) so I can connect to A via B's IP address. The tunnel has been working consistently for about 3 months, but has suddenly failed four times in the last 24 hours. The same thing happened a while back on another VPS provider - months of perfect operation, then suddenly multiple rapid failures.
I have a script on machine A that automatically executes the tunnel command (ssh -R *:X:localhost:X address_of_B
for each port X) but when it executes, it says Warning: remote port forwarding failed for listen port X
.
Going into the sshd /var/log/secure
on the server shows these errors:
bind: Address already in use
error: bind: Address already in use
error: channel_setup_fwd_listener: cannot listen to port: X
Solving requires rebooting the VPS. Until then, all attempts to reconnect give the "remote port forwarding failed" message and will not work. It's now to the point where the tunnel only lasts about 4 hours before stopping.
Nothing has changed on the VPS, and it is a single-use, single user machine that only serves as the reverse tunnel endpoint. It's running OpenSSH_5.3p1 on CentOS 6.5. It seems that sshd is not closing the ports on its end when the connection is lost. I'm at a loss to explain why, or why it would suddenly happen now after months of nearly perfect operation.
To clarify, I first need to figure out why sshd refuses to listen on the ports after the tunnel fails, which seems to be caused by sshd leaving the ports open and never closing them. That seems to be the main problem. I'm just not sure what would cause it to behave this way after months of behaving as I expect (i.e. closing the ports right away and allowing the script to reconnect).
I agree with MadHatter, that it is likely to be port forwardings from defunct ssh connections. Even if your current problem turns out to be something else, you can expect to run into such defunct ssh connections sooner or later.
There are three ways such defunct connections can happen:
- One of the two endpoints got rebooted while the other end of the connection was completely idle.
- One of the two endpoints closed the connection, but at the time where the connection was closed, there was a temporary outage on the connection. The outage lasted for a few minutes after the connection was closed, and thus the other end never learned about the closed connection.
- The connection is still completely functional at both endpoints of the ssh connection, but somebody has put a stateful device somewhere between them, which timed out the connection due to idleness. This stateful device would be either a NAT or a firewall, the firewall you already mentioned is a prime suspect.
Figuring out which of the above three is happening is not highly important, because there is a method, which will address all three. That is the use of keepalive messages.
You should look into the ClientAliveInterval
keyword for sshd_config
and the ServerAliveInterval
interval for ssh_config
or ~/.ssh/config
.
Running the ssh
command in a loop can work fine. It is a good idea to insert a sleep in the loop as well such that you don't end up flooding the server when the connection for some reason fails.
If the client reconnect before the connection has terminated on the server, you can end up in a situation where the new ssh connection is live, but has no port forwardings. In order to avoid that, you need to use the ExitOnForwardFailure
keyword on the client side.
For me when a ssh
tunnel disconnects it takes awhile for the connection to reset so the ssh
process continues to block leaving me with no active tunnels and I don't know why. A workaround solution is to put ssh
into the background with -f
and to spawn new connections without waiting for old connections to reset. The -o ExitOnForwardFailure=yes
can be used to limt the number of new processes. The -o ServerAliveInterval=60
improves the reliability of your current connection.
You can repeat the ssh
command frequently, say, in a cron
, or, in a loop in your script, for example, in the following, we run the ssh
command every 3 minutes:
while (1)
do
ssh -f user@hostname -Rport:host:hostport -N -o ExitOnForwardFailure=yes -o ServerAliveInterval=60
sleep 180
done
You can find the process that's binding the port on that server with
sudo netstat -apn|grep -w X
It seems very likely to be the half-defunct sshd
, but why make assumptions when you can have data? It's also a good way for a script to find a PID to send signal 9 to before trying to bring the tunnel up again.