104: Connection reset by peer while reading response header from upstream (Nginx)

I have a server which was working ok until 3rd Oct 2013 at 10:50am when it began to intermittently return "502 Bad Gateway" errors to the client.

Approximately 4 out of 5 browser requests succeed but about 1 in 5 fail with a 502.

The nginx error log contains many hundreds of these errors;

2013/10/05 06:28:17 [error] 3111#0: *54528 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 66.249.66.75, server: www.bec-components.co.uk  request: ""GET /?_n=Fridgefreezer/Hotpoint/8591P;_i=x8078 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.bec-components.co.uk"

However the PHP error log does not contain any matching errors.

Is there a way to get PHP to give me more info about why it is resetting the connection?

This is nginx.conf;

user              www-data;
worker_processes  4;
error_log         /var/log/nginx/error.log;
pid               /var/run/nginx.pid;

events {
   worker_connections  1024;
}

http {
  include          /etc/nginx/mime.types;
  access_log       /var/log/nginx/access.log;

  sendfile               on;
  keepalive_timeout      30;
  tcp_nodelay            on;
  client_max_body_size   100m;

  gzip         on;
  gzip_types   text/plain application/xml text/javascript application/x-javascript text/css;
  gzip_disable "MSIE [1-6]\.(?!.*SV1)";

  include /gvol/sites/*/nginx.conf;

}

And this is the .conf for this site;

server {

  server_name   www.bec-components.co.uk bec3.uk.to bec4.uk.to bec.home;
  root          /gvol/sites/bec/www/;
  index         index.php index.html;

  location ~ \.(js|css|png|jpg|jpeg|gif|ico)$ {
    expires        2592000;   # 30 days
    log_not_found  off;
  }

  ## Trigger client to download instead of display '.xml' files.
  location ~ \.xml$ {
    add_header Content-disposition "attachment; filename=$1";
  }

   location ~ \.php$ {
      fastcgi_read_timeout  3600;
      include               /etc/nginx/fastcgi_params;
      keepalive_timeout     0;
      fastcgi_param         SCRIPT_FILENAME  $document_root$fastcgi_script_name;
      fastcgi_pass          127.0.0.1:9000;
      fastcgi_index         index.php;
   }
}

## bec-components.co.uk ##
server {
   server_name   bec-components.co.uk;
   rewrite       ^/(.*) http://www.bec-components.co.uk$1 permanent;
}

Solution 1:

i'd always trust if my webservers are telling me: 502 Bad Gateway

  • what is the uptime of your fastcgi/nginx - process?
  • do you monitor network-connections?
  • can you confirm/deny a change of visitors-count around that day?

what does it mean:

  • you fastcgi-process is not accessible by nginx; either to slow or not corresponding at all. bad gateway means: nginx cannot fastcgi_pass to that defined ressource 127.0.0.1:9000; at that very specific moment.

  • your inital error-logs tells it all:

.

recv() failed 
    -> nginx failed

(104: Connection reset by peer) while reading response header from upstream, 
    -> no complete answer, or no answer at all
upstream: "fastcgi://127.0.0.1:9000", 
    -> who is he, who failed???

from my limited pov i'd suggest:

  • restart your fastcgi_process / server
  • check your access-log
  • enable debug-log

Solution 2:

I know this topic is old, but it still continues to pop up occasionally, so, looking for answers on the web, I came up with the following three possibilities:

  1. A programming error is sometimes segfaulting php-fpm, which in turn means that the connection with nginx will be severed. This will usually leave at least some logs around and/or core dumps, which can be analysed further.
  2. For some reason, PHP is not being able to write a session file (usually: session.save_path = "/var/lib/php/sessions"). This can be bad permissions, bad ownership, bad user/group, or more esoteric/obscure issues like running out of inodes on that directory (or even a full disk!). This will usually not leave many core dumps around and possibly not even anything on the PHP error logs.
  3. Even more tricky to debug: an extension is misbehaving (occasionally hitting some kind of inner limit, or a bug which is not triggered all the time), segfaulting, and bringing the php-fpm process down with it — thus closing the connection with nginx. The usual culprits are APC, memcache/d, etc. (in my case it was the New Relic extension), so the idea here is to turn each extension off until the error disappears.

Solution 3:

Kept getting this as well. Solved it by increasing the opcache memory limit, if you use it (replacement for APC). Seems PHP-FPM dropped connections whenever the cache got too full. This is also the reason why shgnInc's answer fixes it for a short time.

So find the file /etc/php5/fpm/php.ini (or equivalent in your distribution) and increase memory_consumption to whatever level your site needs. Disabling opcache may also work.

[opcache]
opcache.memory_consumption = 196 

Solution 4:

In my case of same problem, I just restart the php-fpm service so it solved.

sudo service php5-fpm restart

Or some times this problem happen because of huge of requests. By default the pm.max_requests in php5-fpm maybe is 100 or below.

To solve it increase its value depend on the your site's requests, For example 500.

And after the you have to restart the service

Solution 5:

You may want to consider this git on github: https://gist.github.com/amichaelgrant/90d99d7d5d48bf8fd209

I encountered a similar situation, when I checked error logs for my upstream servers they were reporting some ulimit error so I increased that to 1000000(on both the upstream and nginx boxes) and everything worked fine