random CONNECTION_RESET on apache2.4 debian 9

I would check the size of the TCP packets going between the server and client. IF they are nearing 1500 in size there is a possibility they getting dropped for numerous possibilities:

  1. If the DNF bit is set on the packet and the packet is getting fragmented somewhere this could be an issues that causes the packet to get dropped

  2. If the MTU is set to 1500 and packets are going through tunnels, encryption, etc that causes additional headers to be added to the packet, then this would also cause your packets to drop. Try setting the mtu on both ends on the interfaces your are using to something lower than 1500, possibly 1420 or even lower.


Pretty sure I found the issue :-) as I just had the same thing happen to me.

1. Cause

I think you have TWO or more processes serving port 80 (or 443 if it's about SSL connections). You can check that as follows, here with the command for port 80 and the output from my system that had the problem:

# netstat -tupan | grep ":80.*LISTEN"

Proto Recv-Q Send-Q Local    Foreign  State   PID/Program name
                    Address  Address
tcp6       0      0 :::80    :::*     LISTEN  22718/apache2
tcp6       0      0 :::80    :::*     LISTEN  1794/apache2

Two processes serving the same IP addresses from the same port is indeed possible with port options SO_REUSEADDR and SO_REUSEPORT, see here and here (the section about "Linux >= 3.9").

What the kernel does with SO_REUSEPORT is to distribute incoming TCP connections to the processes serving that port, in a non-deterministic manner. One process is your Apache that serves the request properly, and one is "something else" that does not answer anything, ever. In my case, it was another Apache2 process.

2. Solution

  1. If you have two Apache processes, first find out which of them is the "zombie". For that, stop your regular Apache server (service apache2 stop) and check which one remains (netstat -tupan | grep ":80.*LISTEN"). That's the "zombie". Note its PID.

  2. To find out more about who or what started this "zombie" process:

    • Execute cat /proc/<pid>/loginuid with the PID of that "zombie" process. If it shows 4294967295 it means that the system started it and not a user (reason). Otherwise, it's the UID of a user that you can look up.

    • Execute ps auxf and determine the process uptime of your "zombie" process. If it matches the system uptime, it means that the process was started somehow at boot time.

  3. To (perhaps) find out more about what is happening inside this "zombie" process, you can attach to it with strace. This will create a lot of hard to read logs, but since reproducing the problem of having this "zombie" process might not be easy it seems good to at least collect some of these logs (esp. of HTTP requests going to that process) before we kill the process. You would execute, with the PID of your process instead of $PID:

    strace -o strace.log -f -p $PID
    
  4. To solve the problem for the moment, kill the "zombie" process, supplying its PID for $PID: kill $PID or if needed kill -9 $PID.

  5. Check if that "zombie" process is up and running again after a reboot, and if yes, you'll have to investigate and fix the cause of that.

3. Reproducing the cause

It is possible (but not trivial) to manually create an Apache2 "zombie" process that will run in parallel to the regular Apache server and just "answer nothing". Here are almost-but-not-quite complete instructions:

  1. Create copies of relevant config files:

    cp /etc/apache2/envvars /etc/apache2/envvars-zombie
    cp /etc/apache2/apache2.conf /etc/apache2/apache2-zombie.conf
    
  2. Edit /etc/apache2/envvars-zombie and at the beginning of the script statically set SUFFIX="-zombie", overriding the conditional assignment therein.

  3. Edit /etc/apache2/apache2-zombie.conf and prevent the inclusion of any VirtualHost configuration files. In my case, I'd modify the corresponding line to be:

    # IncludeOptional sites-enabled/
    
  4. Make sure that default listen ports are included in your apache2-zombie.conf file. In my case this already happened via Include ports.conf.

  5. Create lockfile and log dirs needed for the new instance of Apache2, and make them accessible by the user as which your new Apache2 will run:

    mkdir /var/log/apache2-zombie
    chown www-data /var/log/apache2-zombie/
    
    mkdir /var/lock/apache2-zombie
    chown www-data /var/lock/apache2-zombie/
    
  6. Now you should be able to start your "zombie" Apache process as follows:

    cd /etc/apache2/
    source envvars-zombie
    /usr/sbin/apache2 -f apache2-zombie.conf -k start
    
  7. Confirm that there is now indeed a second process running on the Apache2 standard ports: netstat -tupan | grep ":80.*LISTEN".

  8. That second Apache2 server is not yet a "zombie" because it will still answer "404 Not Found" or (since we did not setup SSL) result in a SSL error when making a request on port 443. But you can already observe the effect that a few requests go to this new server and result in these errors, in a non-deterministic manner. (I got up to this point in practice …)

  9. To create a "proper" zombie Apache, set up a simple script that will accept a HTTP request and then do nothing (sleep()) for several minutes to let the browser give up resp. to let the TCP connection time out. Install it for the Apache default host. This way, it will be used for all HTTP requests to the port, since we disabled all VirtualHost configs so Apache cannot find a more suitable host for any request and will choose the default one.