Linux: tracking the source of netstat -s "failed connection attempts"

I have several servers, where the failed connection attempts metric returned by netstat -s (from /proc/net/snmp) grows by roughly one per second, and I'd like to diagnose the source of these.

By using this ipTables rule (on a different server):

-A OUTPUT -p tcp --dport 23 -j REJECT

I am blocking outgoing telnet, so I can run this loop:

while true ; do
telnet www.google.co.uk
netstat -s | grep "failed connection"
done

Trying 209.85.203.94...
telnet: Unable to connect to remote host: Connection refused
52 failed connection attempts
Trying 209.85.203.94... telnet: Unable to connect to remote host: Connection refused
53 failed connection attempts
Trying 209.85.203.94... telnet: Unable to connect to remote host: Connection refused
54 failed connection attempts

So proving that the counter is incremented by failed attempts to connect to remote sockets. (Although it doesn't prove that that's the only cause of increments, of course).

The question is, how can I find the specific combination of remote address and port (or plural of both), which is failing, in order that I can look at the next step; routing / firewall issues?
As an aside, if I run this:

watch -n1 'ss | grep "\<23\>"'

I was hoping to see sockets in the state SYN-SENT, but don't. Is this because I used REJECT, rather than DROP? Thanks


Solution 1:

Let's try to answer the question in another way (hard way). Read the source of the kernel to see, what there is only one place, where this metric increments - tcp_done function. As we can see in the code, the incrementing happens only for connections in SYN_SEND or SYN_RECV states. Then we check, from where the tcp_done can be called. And we can found several places:

  1. tcp_reset - called at abort of connection (reply packet with rst flag received). Yep, it can happen in SYN_SENT and SYN_RECV states (and in other states, theoretically).
  2. tcp_rcv_state_process - called in states TCP_FIN_WAIT1 and TCP_LAST_ACK, so the metric isn't incremented - it's not our case.
  3. tcp_v4_error - called in case of SYN_SENT or SYN_RECV. The tcp_v4_error function called by the ICMP handler.
  4. tcp_time_wait - called at moving the socket into time-wait or fin-wait-2 states - not our case too.
  5. tcp_write_error - called from several places at timeouts and retransmit count exceeded. It can be our suspect too.

Now, open any TCP FSM diagram to check, in what cases the our connection can be in SYN_SENT or SYN_RECV.

In client case it can be only SYN_SENT state, where the syn packets is transmiting, and connection aborted due receiving of reject (tcp-rst or icmp error) or the reply isn't received.

In server case it can be only SYN_RECV state (syn is already received and syn+ack is already sent), and connection aborted due receiving of reject (syn+ack rejected somewhere) or the reply waiting timeout is exceeded (an ack isn't received).

Now you know the reasons of update of this metric and can check the possible sources of it in your system. In modern kernel there are a powerfull tools to troubleshooting at kernel level. Begin from this brief tutorial from Brendan Gregg.