Possible reason for NGINX 499 error codes
Solution 1:
HTTP 499 in Nginx means that the client closed the connection before the server answered the request. In my experience is usually caused by client side timeout. As I know it's an Nginx specific error code.
Solution 2:
In my case, I was impatient and ended up misinterpreting the log.
In fact, the real problem was the communication between nginx and uwsgi, and not between the browser and nginx. If I had loaded the site in my browser and had waited long enough I would have gotten a "504 - Bad Gateway". But it took so long, that I kept trying stuff, and then refresh in the browser. So I never waited long enough to see the 504 error. When refreshing in the browser, that is when the previous request is closed, and Nginx writes that in the log as 499.
Elaboration
Here I will assume that the reader knows as little as I did when I started playing around.
My setup was a reverse proxy, the nginx server, and an application server, the uWSGI server behind it. All requests from the client would go to the nginx server, then forwarded to the uWSGI server, and then response was sent the same way back. I think this is how everyone uses nginx/uwsgi and are supposed to use it.
My nginx worked as it should, but something was wrong with the uwsgi server. There are two ways (maybe more) in which the uwsgi server can fail to respond to the nginx server.
1) uWSGI says, "I'm processing, just wait and you will soon get a response". nginx has a certain period of time, that it is willing to wait, fx 20 seconds. After that, it will respond to the client, with a 504 error.
2) uWSGI is dead, or uWSGi dies while nginx is waiting for it. nginx sees that right away and in that case, it returns a 499 error.
I was testing my setup by making requests in the client (browser). In the browser nothing happened, it just kept hanging. After maybe 10 seconds (less than the timeout) I concluded that something was not right (which was true), and closed the uWSGI server from the command line. Then I would go to the uWSGI settings, try something new, and then restart the uWSGI server. The moment I closed the uWSGI server, the nginx server would return a 499 error.
So I kept debugging with the 499 erroe, which means googling for the 499 error. But if I had waited long enough, I would have gotten the 504 error. If I had gotten the 504 error, I would have been able to understand the problem better, and then be able to debug.
So the conclusion is, that the problem was with uWGSI, which kept hanging ("Wait a little longer, just a little longer, then I will have an answer for you...").
How I fixed that problem, I don't remember. I guess it could be caused by a lot of things.
Solution 3:
The "client" in "client closed the connection" isn't necessarily the Web browser!
You may find 499 errors in an Nginx log file if you have a load balancing service between your users and your Nginx -- using AWS or haproxy. In this configuration the load balancer service will act as a client to the Nginx server and as a server to the Web browser, proxying data back and forth.
For haproxy the default values for certain applicable timeouts are some 60 seconds for connecting to upstream and for reading from upstream (Nginx) or downstream (Web browser).
Meaning that if after some 60 seconds the proxy hasn't connected to the upstream for writing, or if it hasn't received any data from the downstream (Web browser) or upstream (Nginx) as part of a HTTP request or response, respectively, it will close the corresponding connection, which will be treated as an error by the Nginx, at least, if the latter has been processing the request at the time (taking too long).
Timeouts might happen for busy websites or scripts that need more time for execution. You may need to find a timeout value that will work for you. For example extending it to a larger number, like 180 seconds. That may fix it for you.
Depending on your setup you might see a 504 Gateway Timeout
HTTP error in your browser which may indicate that something is wrong with php-fpm. That won't be the case, however, with 499 errors in your log files.
Solution 4:
As you point 499
a connection abortion logged by the nginx. But usually this is produced when your backend server is being too slow, and another proxy timeouts first or the user software aborts the connection. So check if uWSGI is answering fast or not of if there is any load on uWSGI / Database server.
In many cases there are some other proxies between the user and nginx. Some can be in your infrastructure like maybe a CDN, Load Balacer, a Varnish cache etc. Others can be in user side like a caching proxy etc.
If there are proxies on your side like a LoadBalancer / CDN ... you should set the timeouts to timeout first your backend and progressively the other proxies to the user.
If you have:
user >>> CDN >>> Load Balancer >>> Nginx >>> uWSGI
I'll recommend you to set:
-
n
seconds to uWSGI timeout -
n+1
seconds to nginx timeout -
n+2
senconds to timeout to Load Balancer -
n+3
seconds of timeout to the CDN.
If you can't set some of the timeouts (like CDN) find whats is its timeout and adjust the others according to it (n
, n-1
...).
This provides a correct chain of timeouts. and you'll find really whose giving the timeout and return the right response code to the user.
Solution 5:
Turns out 499's really does mean "client interrupted connection."
I had a client "read timeout" setting of 60s (and nginx also has a default proxy_read_timeout of 60s). So what was happening in my case is that nginx would error.log an upstream timed out (110: Connection timed out) while reading upstream
and then nginx retries "the next proxy server in the backend server group you configured." That's if you have more than one.
Then it tries the next and next till (by default) it has exhausted all of them. As each one times out, it removes them from the list of "live" backend servers, as well. After all are exhausted, it returns a 504 gateway timeout.
So in my case nginx marked the server as "unavailable", re-tried it on the next server, then my client's 60s
timeout (immediately) occurred, so I'd see a upstream timed out (110: Connection timed out) while reading upstream
log, immediately followed by a 499 log. But it was just timing coincidence.
Related:
If all servers in the group are marked as currently unavailable, then it returns a 502 Bad Gateway.
for 10s as well. See here max_fails
and fail_timeout. Inn the logs it will say no live upstreams while connecting to upstream.
If you only have one proxy backend in your server group, it just try's the one server, and returns a 504 Gateway Time-out
and doesn't remove the single server from the list of "live" servers, if proxy_read_timeout
is surpassed. See here "If there is only a single server in a group, max_fails, fail_timeout and slow_start parameters are ignored, and such a server will never be considered unavailable."
The really tricky part is that if you specify proxy_pass to "localhost" and your box happens to also have ipv6 and ipv4 "versions of location" on it at the same time (most boxes do by default), it will count as if you had a "list" of multiple servers in your server group, which means you can get into the situation above of having it return "502 for 10s" even though you list only one server. See here "If a domain name resolves to several addresses, all of them will be used in a round-robin fashion."
One workaround is to declare it as proxy_pass http://127.0.0.1:5001;
(its ipv4 address) to avoid it being both ipv6 and ipv4. Then it counts as "only a single server" behavior.
There's a few different settings you can tweak to make this "less" of a problem. Like increasing timeouts or making it so it doesn't mark servers as "disabled" when they timeout...or fixing the list so it's only size 1, see above :)
See also: https://serverfault.com/a/783624/27813