What do I need to consider when setting TCP idle timeouts?
A long idle connection could mean that the connection has broken (either side has the application crashed, network cable unplugged, etc) but the resources would be still allocated, meaning that:
- Performance would be slightly impacted.
- Your application could have a limit of X simultaneous connections, and thus, you could be denying access to new clients having in reality none connected.
- You may not reconnect a client if you were using fixed ports both for source and destination (a little uncommon, but possible).
- You may reach connection/routing limits, impeding new connections to any other port or causing unexpected behaviours or a crash of the server itself.
- Many applications would not stop until all the connections are closed properly, so shutting down or restarting a service would take longer
- You won't be able to distinguish between a broken connection and an active one without inspecting TPC traffic for a while or relying on application logs
- Most client applications do not know how to react on broken connections: Some will wait for an internall timeout, but others will wait forever, causing a potential data loss if the client needs a restart.
The last one would also occur if you set a lower TCP idle timeout than needed, since some systems would simply drop the connection from it's TCP tables while other would send a RST packet to the other part.
Use idle timeouts according to the kind of traffic you manage (for example, Apache servers have a default timeout of 5 minutes, so no connection would be idle for more than 5 minutes [and a few seconds]), but never stablish a lower (or excatly the same) TCP idle timeout than your application's timeout. Implement keepalives on long-time connections at least every few minutes to ensure the connection is alive (TCP keepalives defined on socket creation have a timeout of two hours, which I consider way too high). User-interactive software (like ssh sessions, remote desktop, FTP) would be idle for a few minutes while the user reads, so I wouldn't go for less than 15 minutes.
Note: I would not recommend any TCP idle timeout below a few minutes except on highly intensive connections that won't be idle for more than a few seconds. If possible, set different idle tiemouts depending on your traffic (i.e. 6 minutes for web servers, 15 for ssh sessions, etc).
If in need of higher timeouts (someone requests an "eternal" TCP connection), try to use keepalive at application layer instead.