104: Connection reset by peer while reading response header from upstream (Nginx)
I have a server which was working ok until 3rd Oct 2013 at 10:50am when it began to intermittently return "502 Bad Gateway" errors to the client.
Approximately 4 out of 5 browser requests succeed but about 1 in 5 fail with a 502.
The nginx error log contains many hundreds of these errors;
2013/10/05 06:28:17 [error] 3111#0: *54528 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 66.249.66.75, server: www.bec-components.co.uk request: ""GET /?_n=Fridgefreezer/Hotpoint/8591P;_i=x8078 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.bec-components.co.uk"
However the PHP error log does not contain any matching errors.
Is there a way to get PHP to give me more info about why it is resetting the connection?
This is nginx.conf
;
user www-data;
worker_processes 4;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
access_log /var/log/nginx/access.log;
sendfile on;
keepalive_timeout 30;
tcp_nodelay on;
client_max_body_size 100m;
gzip on;
gzip_types text/plain application/xml text/javascript application/x-javascript text/css;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";
include /gvol/sites/*/nginx.conf;
}
And this is the .conf
for this site;
server {
server_name www.bec-components.co.uk bec3.uk.to bec4.uk.to bec.home;
root /gvol/sites/bec/www/;
index index.php index.html;
location ~ \.(js|css|png|jpg|jpeg|gif|ico)$ {
expires 2592000; # 30 days
log_not_found off;
}
## Trigger client to download instead of display '.xml' files.
location ~ \.xml$ {
add_header Content-disposition "attachment; filename=$1";
}
location ~ \.php$ {
fastcgi_read_timeout 3600;
include /etc/nginx/fastcgi_params;
keepalive_timeout 0;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
}
}
## bec-components.co.uk ##
server {
server_name bec-components.co.uk;
rewrite ^/(.*) http://www.bec-components.co.uk$1 permanent;
}
Solution 1:
i'd always trust if my webservers are telling me: 502 Bad Gateway
- what is the uptime of your fastcgi/nginx - process?
- do you monitor network-connections?
- can you confirm/deny a change of visitors-count around that day?
what does it mean:
you fastcgi-process is not accessible by nginx; either to slow or not corresponding at all. bad gateway means: nginx cannot fastcgi_pass to that defined ressource 127.0.0.1:9000; at that very specific moment.
your inital error-logs tells it all:
.
recv() failed
-> nginx failed
(104: Connection reset by peer) while reading response header from upstream,
-> no complete answer, or no answer at all
upstream: "fastcgi://127.0.0.1:9000",
-> who is he, who failed???
from my limited pov i'd suggest:
- restart your fastcgi_process / server
- check your access-log
- enable debug-log
Solution 2:
I know this topic is old, but it still continues to pop up occasionally, so, looking for answers on the web, I came up with the following three possibilities:
- A programming error is sometimes segfaulting php-fpm, which in turn means that the connection with nginx will be severed. This will usually leave at least some logs around and/or core dumps, which can be analysed further.
- For some reason, PHP is not being able to write a session file (usually:
session.save_path = "/var/lib/php/sessions"
). This can be bad permissions, bad ownership, bad user/group, or more esoteric/obscure issues like running out of inodes on that directory (or even a full disk!). This will usually not leave many core dumps around and possibly not even anything on the PHP error logs. - Even more tricky to debug: an extension is misbehaving (occasionally hitting some kind of inner limit, or a bug which is not triggered all the time), segfaulting, and bringing the php-fpm process down with it — thus closing the connection with nginx. The usual culprits are APC, memcache/d, etc. (in my case it was the New Relic extension), so the idea here is to turn each extension off until the error disappears.
Solution 3:
Kept getting this as well. Solved it by increasing the opcache
memory limit, if you use it (replacement for APC). Seems PHP-FPM dropped connections whenever the cache got too full. This is also the reason why shgnInc's answer fixes it for a short time.
So find the file /etc/php5/fpm/php.ini
(or equivalent in your distribution) and increase memory_consumption
to whatever level your site needs. Disabling opcache
may also work.
[opcache]
opcache.memory_consumption = 196
Solution 4:
In my case of same problem, I just restart the php-fpm
service so it solved.
sudo service php5-fpm restart
Or some times this problem happen because of huge of requests. By default the pm.max_requests
in php5-fpm maybe is 100 or below.
To solve it increase its value depend on the your site's requests, For example 500.
And after the you have to restart the service
Solution 5:
You may want to consider this git on github: https://gist.github.com/amichaelgrant/90d99d7d5d48bf8fd209
I encountered a similar situation, when I checked error logs for my upstream servers they were reporting some ulimit error so I increased that to 1000000(on both the upstream and nginx boxes) and everything worked fine