What are the ramifications of setting tcp_tw_recycle/reuse to 1?
I set both tcp_tw_recycle/reuse to 1 in my configuration file.
What are the ramifications of doing this?
If a tcp socket is re-used, does that pose a security risk? i.e. 2 different connections both potentially being able to send data in?
Is it suitable for short-lived connections with litle chance of reconnection?
Solution 1:
By default, when both tcp_tw_reuse
and tcp_tw_recycle
are disabled, the kernel will make sure that sockets in TIME_WAIT
state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
When you enable tcp_tw_reuse
, sockets in TIME_WAIT
state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps
(a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
When you enable tcp_tw_recycle
, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT
state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN
packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
Now, timestamps should never go back in time; unless:
- the host is rebooted (but then, by the time it comes back up,
TIME_WAIT
socket will probably have expired, so it will be a non issue); - the IP address is quickly reused by something else (
TIME_WAIT
connections will stay a bit, but other connections will probably be struck byTCP RST
and that will free up some space); - network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT
bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
Some people recommend to leave tcp_tw_recycle
alone, but enable tcp_tw_reuse
and lower tcp_timewait_len
. I concur :-)
Solution 2:
I just had this bite me, so perhaps someone might benefit from my pain and suffering. First, an involved link with lots of info: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
In particular:
The mere result of this lack of documentation is that we find numerous tuning guides advising to set both these settings to 1 to reduce the number of entries in the TIME-WAIT state. However, as stated by tcp(7) manual page, the net.ipv4.tcp_tw_recycle option is quite problematic for public-facing servers as it won’t handle connections from two different computers behind the same NAT device, which is a problem hard to detect and waiting to bite you:
I used having those enabled quite successfully to provide as low latency as possible, haproxy connectivity from clients to a MySql NDB cluster. This was in a private cloud, and no connections at all from any to any had any sort of NAT in the mix. The use case made sense, lower the latency for radius clients hitting NDB via haproxy as much as humanly possible. It did so.
I did it again on a public facing haproxy system, load balancing web traffic, without really studying the impact (dumb, right?!) and discovered after much troubleshooting and chasing ghosts that:
- It will create mayhem for clients connecting through a NAT.
- It is nearly impossible to identify because it's completely random, intermittent, and the symptoms will hit customer A, at completely different (or not) times than customer B, etc.
On the customer side, they will see periods of time where they no longer get responses to the SYN packets, sometimes here and there, and sometimes for long periods. Again, random.
The short story here, in my recent, painful, experience, is leave these alone/disabled on public facing servers, regardless of role!
Solution 3:
From 'man 7 tcp' You will see this:
tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT
(Network Address Translation).
tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. It should not be changed without
advice/request of technical experts.
Not much help there. This uestion also has some good insight:
https://stackoverflow.com/questions/6426253/tcp-tw-reuse-vs-tcp-tw-recycle-which-to-use-or-both
But not specific info on why reuse is safer than recycle. The basic answer is that tcp_tw_reuse will allow one to make use of the same socket if there is already one in TIME_WAIT with the same TCP parameters and that is in a state where no further traffic is expected (I believe its when a FIN has been sent). tcp_tw_recycle on the other hand will just reuse the sockets that are in TIME_WAIT with the same parameters regardless of the state, which can confuse stateful firewalls which might be expecting different packets.
tcp_tw_reuse can be done selectively in code by setting the SO_REUSEADDR socket option, documented in man 7 socket
as such:
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET
sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening
socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address. Argument is
an integer boolean flag.