Postgres crash loop caused by a tuple concurrently updated error
Solution 1:
To fix the issue:
- Find the name of the postgres pod that is in the crash loop.
- Start an
oc debug
session with the pod. - Scale the associated Postgres deployment to zero pods.
-
From the cmd line of the debug session;
- Run
run-postgresql
. This is theCMD
for the docker image. As part of the start-up process the script creates a number of files that won't exist in the pod otherwise, namely/var/lib/pgsql/openshift-custom-postgresql.conf
and/var/lib/pgsql/passwd
, which will stop you from running any of thepg_ctl
commands. When you run the command you should see the same error output listed above. -
Run
pg_ctl stop -D /var/lib/pgsql/data/userdata
to cleanly shutdown Postgres. You should see;waiting for server to shut down.... done server stopped
-
Run
pg_ctl start -D /var/lib/pgsql/data/userdata
to start Postgres. You should see the following output and it should wait there indefinitly (no errors);server starting sh-4.2$ LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory "pg_log".
Press
enter
a couple of times to get back to the cmd prompt.-
Run
pg_ctl stop -D /var/lib/pgsql/data/userdata
, and wait for postgres to stop. This will ensure a clean shutdown.waiting for server to shut down.... done server stopped
Exit the debug session.
- Scale the deployment to 1 pod. Postgres should start normally now.
- Run
Solution found after long fight on: https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html Credits go to author: Wade Barnes