Why is our firewall (Ubuntu 8.04) rejecting the final packet (FIN, ACK, PSH) with a RST
A similar situation is described at http://www.spinics.net/lists/netfilter/msg51408.html: some packets which should have been processed by NAT somehow got marked as INVALID instead of ESTABLISHED, and went to the INPUT chain. You should add some rules with -m state --state INVALID
to check for this, and the answer at http://www.spinics.net/lists/netfilter/msg51409.html suggests that such INVALID packet should always be DROPped, because NAT is not performed on them properly, therefore addresses in them may be wrong.
If your problematic packets are really marked as INVALID, adding iptables -I INPUT -m state --state INVALID -j DROP
probably will work around the problem (the broken packet will not get to the local process and will not cause the RST response, then TCP will recover from the lost packet after a timeout). Then you can try to debug the problem further, as described in http://www.spinics.net/lists/netfilter/msg51411.html:
echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid
(In that particular case the problem was caused by some broken networking hardware along the path, probably combined with some TCP checksum offload brokenness.)
I have seen this behavior on other firewall types and the behavior was so identical I figured I'd throw it out there.
The issue I had was that the firewall was NAT'ing into the same space as ephemeral ports on the box. This would cause this exact behavior if the two collided because the kernel was now assuming the connection was meant for the local machine. To this end there are a couple things you can check. First are you specifying the outbound port config in iptables (using --to-ports)? Or have you modified the ephemeral port range on the machine:
$ cat /proc/sys/net/ipv4/ip_local_port_range
To diagnose you can setup your capture and see if you see any other requests using the same external fw ip,port combo within 3*MSL time before the RST(~180s I think).
While I'm not confident that is the answer yet, if I were in this situation I would rule that out first and then look at a couple other things.
Is this easy to reproduce? Is it possible to capture more diagnostics from the firewall box and see the problem occur? I would try to capture:
$ netstat -anp
$ cat /proc/net/ip_conntrack
every second or so while trying to reproduce and see if there is something binding locally to the port and what the masquerade table looked like during the problem.
If you firewall the RST outbound does the eventual ACK from the internal client cause the connection to succeed?
Last thing, are you seeing all the logs? Have you already checked dmesg? Have you setup *.* on the firewall box in the syslog configuration to a file to make sure?
Let me know what you find! I really appreciate the amount of information you provided in the question, thanks.