How to debug PostgreSQL segmentation fault?
Jump to a frame that contains some queryDesc
variable, e.g 12
:
(gdb) frame 12
#12 0x00007f0ec05ae09d in pgss_ExecutorRun (queryDesc=0x557cc302b7d0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1045
1045 in ./build/../contrib/pg_stat_statements/pg_stat_statements.c
print that variable:
(gdb) p queryDesc
$1 = (QueryDesc *) 0x557cc302b7d0
now copy the line above after equal sign and dereference it using *
(gdb) p *(QueryDesc *) 0x557cc302b7d0
$6 = {operation = CMD_SELECT, plannedstmt = 0x557cc300e218,
sourceText = 0x557cc302b370 "\n", ' ' <repeats 12 times>, "DECLARE \"categoryPagePhotoUrl_image_urls\" CURSOR WITH HOLD FOR\n", ' ' <repeats 12 times>, "SELECT di.itemId, image_number, filename FROM (SELECT *\n", ' ' <repeats 12 times>, "FROM downl"..., snapshot = 0x557cc2e9b188, crosscheck_snapshot = 0x0, dest = 0x557cc302b860, params = 0x0, queryEnv = 0x0, instrument_options = 0, tupDesc = 0x557cc2f7bff8,
estate = 0x557cc2cf8d08, planstate = 0x557cc2cf8f68, already_executed = true, totaltime = 0x0}
It doesn't give you the whole query but at least an idea on which table is the query executed.
Based on the gdb
output I've managed to isolate clients that were executing such query.
I've tried running VACUUM FULL
on the affected table, rebuilding table and indexes, switching to replica, copying whole database using pg_dump
. Nonetheless the issue still persisted also on database copies.
Finally I've managed to isolate a minimal SQL code to replicate the issue.
$ pg_createcluster 13 main
$ createdb testdb
$ psql -d testdb -f postgresql-segfault.sql
CREATE SCHEMA
CREATE TABLE
COPY 1
ALTER TABLE
BEGIN
CREATE TABLE
DECLARE CURSOR
itemid
---------
1190300
(1 row)
psql:postgresql-segfault:34: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:postgresql-segfault:34: fatal: connection to server was lost
With a code to replicate this was enough to report a bug to pgsql-bugs mailing list (there's also a webform). Turned out to be a bug with re-execution of a plan that already reached completion on a un-stable cursor that was included in PostgreSQL 13.4, 12.8 (and possibly other versions).