How to suspend Nginx requests during backend upgrades
I'd like to have Nginx suspend (hold) http requests for short periods of time, while I upgrade or restart the backend service, do a database migration, or some other administrative task, without causing errors to the end users.
Basically I'd like to do the following sequence of operations:
- tell Nginx to stop forwarding requests to my backend, instead holding them in its asynchronous queues;
- wait for and be notified when all pending requests to the backend are completed;
- upgrade, restart, or otherwise operate on the idle backend service;
- quickly verify that the backend service is operational, using its private address;
- open Nginx's flood gates and let all the pending requests through.
Ideally, this would let me perform administrative tasks that need exclusive access to the entire backend server, such as restarts, upgrades, or migrations, without causing end users anything more serious than a delay, hopefully of less than a minute.
I have found this solution, but it's not addressing point 2. Moreover, it needs the Lua interpreter compiled into Nginx, with any memory leaks and security issues that might imply.
Is there any configuration trick or Nginx module specifically targeting this issue? Can it be done with stock Nginx, maybe by testing for the existence of a control file?
How are other admins addressing this issue at large?
(I know that the all-purpose and somewhat sprawling uWSGI application server has this feature, among hundreds of others, but I'd rather avoid introducing yet another element between Nginx and my backend.)
While searching for a solution to this problem, I came across a small GitHub repository called insomnia that illustrates a simple and elegant approach using Lua. Assuming you've got the Lua module installed, you'll first want to enable it at the top of your nginx.conf
:
load_module /usr/lib/nginx/modules/ndk_http_module.so;
load_module /usr/lib/nginx/modules/ngx_http_lua_module.so;
Then, in your http
block, set up a shared Lua variable to track the suspend/resume state:
http {
lua_shared_dict state 12k;
init_by_lua_block {
ngx.shared.state:set("suspend", false)
}
# rest of your http block
}
Next, in your server block, set up a secret location for telling the server to suspend/resume:
location = /suspend/MySuperSecretMagicString {
if ($request_method = PUT) {
content_by_lua_block {
ngx.req.read_body()
content = ngx.req.get_body_data()
if (content == "go2sleep") then
ngx.shared.state:set("suspend", true)
else
ngx.shared.state:set("suspend", false)
end
}
}
}
And in your main location block, add one more bit of Lua:
location / {
access_by_lua_block {
while (ngx.shared.state:get("suspend") == true) do
ngx.sleep(0.2)
end
}
proxy_pass http://my-backend;
}
Now you can fire off a request to enter suspend mode:
curl -X PUT -d go2sleep http://localhost/suspend/MySuperSecretMagicString
And to release all of the buffered requests to the backend, just replace go2sleep
by anything else:
curl -X PUT -d UnleashTheHounds http://localhost/suspend/MySuperSecretMagicString
Note that each suspended request will have its own worker, so you'll need to have enough worker_connections
to handle the anticipated backlog. See the insomnia repository for futher commentary and some additional functionality. A similar, but more elaborate approach, can be found in Basecamp's intermission repository.
Again, I did not come up with this technique. All credit rightfully belongs to the GitHub user "solso".
I guess you are handling a maintenance scenario with minimal downtime.
I would recommend you to use a backup server to hold the request for required seconds <20> (or) even <100> and redirect to original URI once the application is restarted.
You can follow below nginx thread, where solution is shared.
http://forum.nginx.org/read.php?2,177,177#msg-177
you could configure nginx to exec a script so as to massage and pass the http call to some queue process as such as beanstalkd, by using error handling on 502 BAD GATEWAY and/or 503 SERVICE UNAVAILABLE. (error when your backend service is not available).
then, after backend upgrade, pop the reqs off of beanstalkd, and process them to your backend service.
additionally, if your backend service ever goes down unintentionally, this may double as an HA solution to not lose api calls. set up a jenkins/cron to automagically check and process any beanstalk queue.
I have never done it, but if I ever tried I think I would use a firewall. I'd probably need to script the solution, which would go like this:
- Tell the firewall to allow existing connections, but to filter new connections on port 80 (or 443).
- Wait for pending requests to the backend to complete (even if nginx connections to clients are still open).
- Upgrade, restart, whatever.
- Tell the firewall to allow connections again.
If steps 2+3 don't take too long, clients will retry and eventually manage to connect in step 4. If they do take too long, clients will time-out, but that is not a problem since the users' patience will have timed out earlier, no?
The solution has some catches. A client could manage to get the main page, and then it might not be able to proceed to get the static files (whereas with what you have in mind, it could). This will not be a problem, however, if you serve your static files from another machine or CDN.
In addition, I believe that someone would normally worry about what you are worrying only after they would already have set up some high availability solution, e.g. two servers plus an IP address that can be moved from one server to the other. When you move the IP address from one server to the other, the users that have a connection open are disconnected. I think that this is acceptable, because these users get something like a blank page on their browser, wonder what the heck happened, click reload, this time it works, and they don't bother any more; they don't even remember the incident a few minutes later. Using the firewall trick to avoid this disconnection would create more problems than it solves, because step 2 would need to be modified to wait for nginx to finish serving requests to clients, and this might take too long if a user has a slow connection. In any case, I think that high availability has so many issues that this would get such a low priority that it would never be done (unless you are Google or something).