Nginx proxy_read_timeout vs. proxy_connect_timeout

I've started using Nginx as a reverse proxy for a set of servers that provide some sort of service.

The service can be rather slow at times (its running on Java and the JVM sometimes gets stuck in "full garbage collection" that may take several seconds), so I've set the proxy_connect_timeout to 2 seconds, which will give Nginx enough time to figure out that the service is stuck on GC and will not respond in time, and it should pass the request to a different server.

I've also set proxy_read_timeout to prevent the reverse proxy from getting stuck if the service itself takes too much time to compute the response - again, it should move the request to another server that should be free enough to return a timely response.

I've run some benchmarks and I can see clearly that the proxy_connect_timeout works properly as some requests return exactly on the time specified for the connection timeout, as the service is stuck and doesn't accept incoming connections (the service is using Jetty as an embedded servlet container). The proxy_read_timeout also works, as I can see requests that return after the timeout specified there.

The problem is that I would have expected to see some requests that timeout after proxy_read_timeout + proxy_connect_timeout, or almost that length of time, if the service is stuck and won't accept connections when Nginx tries to access it, but before Nginx can timeout - it gets released and starts processing, but is too slow and Nginx would abort because of the read timeout. I believe that the service has such cases, but after running several benchmarks, totaling several millions of requests - I failed to see a single request that returns in anything above proxy_read_timeout (which is the larger timeout).

I would appreciate any comment on this issue, though I think that could be due to a bug in Nginx (I have yet to look at the code, so this is just an assumption) that the timeout counter doesn't get reset after the connection is successful, if Nginx didn't read anything from the upstream server.


I was actually unable to reproduce this on:

2011/08/20 20:08:43 [notice] 8925#0: nginx/0.8.53
2011/08/20 20:08:43 [notice] 8925#0: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48)
2011/08/20 20:08:43 [notice] 8925#0: OS: Linux 2.6.39.1-x86_64-linode19

I set this up in my nginx.conf:

proxy_connect_timeout   10;
proxy_send_timeout      15;
proxy_read_timeout      20;

I then setup two test servers. One that would just timeout on the SYN, and one that would accept connections but never respond:

upstream dev_edge {
  server 127.0.0.1:2280 max_fails=0 fail_timeout=0s; # SYN timeout
  server 10.4.1.1:22 max_fails=0 fail_timeout=0s; # accept but never responds
}

Then I sent in one test connection:

[m4@ben conf]$ telnet localhost 2480
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost

HTTP/1.1 504 Gateway Time-out
Server: nginx
Date: Sun, 21 Aug 2011 03:12:03 GMT
Content-Type: text/html
Content-Length: 176
Connection: keep-alive

Then watched error_log which showed this:

2011/08/20 20:11:43 [error] 8927#0: *1 upstream timed out (110: Connection timed out) while connecting to upstream, client: 127.0.0.1, server: ben.dev.b0.lt, request: "GET / HTTP/1.1", upstream: "http://10.4.1.1:22/", host: "localhost"

then:

2011/08/20 20:12:03 [error] 8927#0: *1 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: ben.dev.b0.lt, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:2280/", host: "localhost"

And then the access.log which has the expected 30s timeout (10+20):

504:32.931:10.003, 20.008:.:176 1 127.0.0.1 localrhost - [20/Aug/2011:20:12:03 -0700] "GET / HTTP/1.1" "-" "-" "-" dev_edge 10.4.1.1:22, 127.0.0.1:2280 -

Here is the log format I'm using which includes the individual upstream timeouts:

log_format  edge  '$status:$request_time:$upstream_response_time:$pipe:$body_bytes_sent $connection $remote_addr $host $remote_user [$time_local] "$request" "$http_referer" "$http_user_agent" "$http_x_forwarded_for" $edge $upstream_addr $upstream_cache_status';

The problem is that I would have expected to see some requests that timeout after proxy_read_timeout + proxy_connect_timeout, or almost that length of time, if the service is stuck and won't accept connections when Nginx tries to access it, but before Nginx can timeout - it gets released and starts processing, but is too slow and Nginx would abort because of the read timeout.

Connect timeout means TCP stalls when handshaking (for e. g., there were no SYN_ACKs). TCP would re-try sending SYNs, but you've given only to 2 sec. to Nginx to go use another Server, so it simply has no time for re-sending SYNs.

UPD.: Couldn't find in docs, but tcpdump shows that there's 3 sec. delay between 1st sent SYN and the 2nd attempt to send SYN.