TCP Keepalive and firewall killing idle sessions

In a customer site, the network team added a firewall between the client and the server. This is causing idle connections to get disconnected after about 40 minutes of idle time. The network people say that the firewall doesn't have any idle connection timeout, but the fact is that the idle connections get broken.

In order to get around this, we first configured the server (a Linux machine) with TCP keepalives turned on with tcp_keepalive_time=300, tcp_keepalive_intvl=300, and tcp_keepalive_probes=30000. This works, and the connections stay viable for days or more. However, we would also like the server to detect dead clients and kill the connection, so we changed the settings to time=300,intvl=180,probes=10, thinking that if the client was indeed alive, the server would probe every 300s (5 minutes) and the client would respond with an ACK and that would keep the firewall from seeing this as an idle connection and killing it. If the client was dead, after 10 probes, the server would abort the connection. To our surprise, the idle but alive connections get killed after about 40 minutes as before.

Wireshark running on the client side shows no keepalives at all between the server and client, even when keepalives are enabled on the server.

What could be happening here?

If the keepalive settings on the server are time=300,intvl=180,probes=10, I would expect that if the client is alive but idle, the server would send keepalive probes every 300 seconds and leave the connection alone, and if the client is dead, it would send one after 300 seconds, then 9 more probes every 180 seconds before killing the connection. Am I right?

One possibility is that the firewall is somehow intercepting the keepalive probes from the server and failing to pass them on to the client, and the fact that it got a probe makes it think that the connection is active. Is this common behavior for a firewall? We don't know what kind of firewall is involved.

The server is a Teradata node and the connection is from a Teradata client utility to the database server, port 1025 on the server side, but we have seen the same problem with an SSH connection so we think it affects all TCP connections.


Solution 1:

A statefull firewall checks the packets and also confirm if the connection is alive. I believe that the firewall should also have the settings fine tuned the same way the computers have. By default many firewall only keep idle connections opened for 60 minutes but this time might change depending on the vendor.

Some vendors will have features like TCP Intercept, TCP State Bypass, and Dead Connection Detection that will allow to handle special situations like yours.

Other option is to configure the firewall itself with the same parameters you have on the servers to make sure everything is consistent.

On a cisco firewall you have the following command to configure it.

hostname(config)# timeout feature time

timeout conn hh:mm:ss—The idle time after which a connection closes, between 0:5:0 and 1193:0:0. The default is 1 hour (1:0:0).

you have multiple parameters according with your needs.

I would advise to speak with the team that manages the firewall and adjust the timings according with your needs or check the functionalities.