Unicorn exit timeout on Heroku after trapping TERM and sending QUIT
I am receiving R12 Exit Timeout errors for a Heroku app running unicorn and sidekiq. These errors occur 1-2 times a day and whenever I deploy. I understand that I need to convert the shutdown signals from Heroku for unicorn to respond correctly, but thought that I had done so in the below unicorn config:
worker_processes 3
timeout 30
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
Process.kill 'QUIT', Process.pid
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
Rails.logger.info('Disconnected from ActiveRecord')
end
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection
Rails.logger.info('Connected to ActiveRecord')
end
Sidekiq.configure_client do |config|
config.redis = { :size => 1 }
end
end
My logs surrounding the error look like this:
Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137
It appears that all of the child processes were successfully reaped before the timeout. Is it possible master is still alive? Also, should the router still be sending web requests to the dyno during shut down, as shown in the logs?
FWIW, I'm using Heroku's zero downtime deployment plugin (https://devcenter.heroku.com/articles/labs-preboot/).
I think your custom signal handling is what's causing the timeouts here.
EDIT: I'm getting downvoted for disagreeing with Heroku's documentation and I'd like to address this.
Configuring your Unicorn application to catch and swallow the TERM signal is the most likely cause of your application hanging and not shutting down correctly.
Heroku seems to argue that catching and transforming a TERM signal into a QUIT signal is the right behavior to turn a hard shutdown into a graceful shutdown.
However, doing this seems to introduce the risk of no shutdown at all in some cases - the root of this bug. Users experiencing hanging dynos running Unicorn should consider the evidence and make their own decision based on first principles, not just documentation.