kill -9 a postgres process

A postgres SELECT query ran out of control on our DB server and started eating up tons of memory and swap until the server ran out of memory. I found the particular process via ps aux | grep postgres and ran kill -9 pid. This killed the process and the memory freed up as expected. The rest of the system and postgres queries appeared to be unaffected. This server is running postgres 9.1.3 on SLES 9 SP4.

However, one of our developers chewed me out for killing a postgres process with kill -9, saying that it will take down the entire postgres service. In reality, it did not. I've done this before a handful of times and have not seen any negative side effects.

With that said, and after further reading, it looks like kill pid without the flags is the preferred way to kill a runaway postgres process, but per other users in the postgres community, it also sounds like postgres has "gotten better" over the years such that kill -9 on an individual query process/thread is no longer a death sentence.

Can someone enlighten me on the proper way to kill a runaway postgres process as well as the how disastrous (or benign) using kill -9 is with Postgres these days? Thanks for the insight.

Solution 1:

voretaq7's answer covers the key points, including the correct way to terminate backends but I'd like to add a little more explanation.

kill -9 (ie SIGKILL) should never, ever, ever be your first-choice default. It should be your last resort when the process doesn't respond to its normal shutdown requests and a SIGTERM (kill -15) has had no effect. That's true of Pg and pretty much everything else.

kill -9 gives the killed process no chance to do any cleanup at all.

When it comes to PostgreSQL, Pg sees a backed that's terminated by kill -9 as a backed crash. It knows the backend might have corrupted shared memory - because you could've interrupted it half way through writing a page into shm or modifying one, for example - so it terminates and restarts all the other backends when it notices that a backend has suddenly vanished and exited with a non-zero error code.

You'll see this reported in the logs.

If it appears to do no harm, that because Pg is restarting everything after the crash and your application is recovering from the lost connections cleanly. That doesn't make it a good idea. If nothing else backend crashes are less well tested than the normal-functioning parts of Pg and are much more complicated/varied, so the chances of a bug lurking in backend crash handling and recovery are higher.

BTW, if you kill -9 the postmaster then remove postmaster.pid and start it again without making sure every postgres backend is gone, very bad things can happen. This could easily happen if you accidentally killed the postmaster instead of a backend, saw the database had gone down, tried to restart it, removed the "stale" .pid file when the restart failed, and tried to restart it again. That's one of the reasons you should avoid waving kill -9 around Pg, and shouldn't delete postmaster.pid.

A demonstration:

To see exactly what happens when you kill -9 a backend, try these simple steps. Open two terminals, open psql in each, and in each run SELECT pg_backend_pid();. In another terminal kill -9 one of the PIDs. Now run SELECT pg_backend_pid(); in both psql sessions again. Notice how they both lost their connections?

Session 1, which we killed:

$ psql regress
psql (9.1.4)
Type "help" for help.

regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6357
(1 row)

[kill -9 of session one happens at this point]

regress=# select pg_backend_pid();
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6463
(1 row)

Session 2, which was collateral damage:

$ psql regress
psql (9.1.4)
Type "help" for help.

regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6283
(1 row)

[kill -9 of session one happens at this point]

regress=# select pg_backend_pid();
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
regress=# select pg_backend_pid();
 pg_backend_pid 
----------------
           6464
(1 row)

See how both sessions were broken? That's why you don't kill -9 a backend.

Solution 2:

I found the particular process via ps aux | grep postgres and ran kill -9 pid.
NO! BAD! STEP AWAY FROM THE BACKEND!

Seriously -- Don't kill Postgres backends like that -- TERRIBLE things can happen (even with all the stability enhancements that have been made since the 7.x days) which can trash your whole DB, and your developer is quite right to chew you out for doing this.

There is, in fact, a blessed and approved way of doing this from within Postgres -- It's even in the Postgres manual though that SO post does a better job of explaining it...

SELECT pg_cancel_backend(pid)
Sends a cancel (SIGINT) signal to the specified backend, which cancels the currently running query.

select pg_terminate_backend(pid)
Sends a terminate (SIGTERM) signal to the specified backend, which cancels the query and aborts the backend (dropping its connection).

Backend IDs can be obtained from the pg_stat_activity table (or ps)

kill -9 a postgres process

Solution 1:

Solution 2:

Related

Recent Posts