Nginx - What is does the nodelay option do when limiting requests?

Solution 1:

The documentation here has an explanation that sounds like what you want to know:

The directive specifies the zone (zone) and the maximum possible bursts of requests (burst). If the rate exceeds the demands outlined in the zone, the request is delayed, so that queries are processed at a given speed

From what I understand, requests over the burst will be delayed (take more time and wait until they can be served), with the nodelay options the delay is not used and excess requests are denied with a 503 error.

This blog post (archive.org) gives good explanation how the rate limiting works on nginx:

If you’re like me, you’re probably wondering what the heck burst really means. Here is the trick: replace the word ‘burst’ with ‘bucket’, and assume that every user is given a bucket with 5 tokens. Every time that they exceed the rate of 1 request per second, they have to pay a token. Once they’ve spent all of their tokens, they are given an HTTP 503 error message, which has essentially become the standard for ‘back off, man!’.

Solution 2:

TL;DR: The nodelay option is useful if you want to impose a rate limit without constraining the allowed spacing between requests.

I had a hard time digesting the other answers, and then I discovered new documentation from Nginx with examples that answers this: https://www.nginx.com/blog/rate-limiting-nginx/

Here's the pertinent part. Given:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

location /login/ {
  limit_req zone=mylimit burst=20;
  ...
}

The burst parameter defines how many requests a client can make in excess of the rate specified by the zone (with our sample mylimit zone, the rate limit is 10 requests per second, or 1 every 100 milliseconds). A request that arrives sooner than 100 milliseconds after the previous one is put in a queue, and here we are setting the queue size to 20.

That means if 21 requests arrive from a given IP address simultaneously, NGINX forwards the first one to the upstream server group immediately and puts the remaining 20 in the queue. It then forwards a queued request every 100 milliseconds, and returns 503 to the client only if an incoming request makes the number of queued requests go over 20.

If you add nodelay:

location /login/ {
  limit_req zone=mylimit burst=20 nodelay;
  ...
}

With the nodelay parameter, NGINX still allocates slots in the queue according to the burst parameter and imposes the configured rate limit, but not by spacing out the forwarding of queued requests. Instead, when a request arrives “too soon”, NGINX forwards it immediately as long as there is a slot available for it in the queue. It marks that slot as “taken” and does not free it for use by another request until the appropriate time has passed (in our example, after 100 milliseconds).

Solution 3:

The way I see it is as follows:

  1. Requests will be served as fast as possible until the zone rate is exceeded. The zone rate is "on average", so if your rate is 1r/s and burst 10 you can have 10 requests in 10 second window.

  2. After the zone rate is exceeded:

    a. Without nodelay, further requests up to burst will be delayed.

    b. With nodelay, further requests up to burst will be served as fast as possible.

  3. After the burst is exceeded, server will return error response until the burst window expires. e.g. for rate 1r/s and burst 10, client will need to wait up to 10 seconds for the next accepted request.