Postgres crash loop caused by a tuple concurrently updated error

Solution 1:

To fix the issue:

  1. Find the name of the postgres pod that is in the crash loop.
  2. Start an oc debug session with the pod.
  3. Scale the associated Postgres deployment to zero pods.
  4. From the cmd line of the debug session;

    • Run run-postgresql. This is the CMD for the docker image. As part of the start-up process the script creates a number of files that won't exist in the pod otherwise, namely /var/lib/pgsql/openshift-custom-postgresql.conf and /var/lib/pgsql/passwd, which will stop you from running any of the pg_ctl commands. When you run the command you should see the same error output listed above.
    • Run pg_ctl stop -D /var/lib/pgsql/data/userdata to cleanly shutdown Postgres. You should see;

      waiting for server to shut down.... done server stopped

    • Run pg_ctl start -D /var/lib/pgsql/data/userdata to start Postgres. You should see the following output and it should wait there indefinitly (no errors);

      server starting sh-4.2$ LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory "pg_log".

    • Press enter a couple of times to get back to the cmd prompt.

    • Run pg_ctl stop -D /var/lib/pgsql/data/userdata, and wait for postgres to stop. This will ensure a clean shutdown.

      waiting for server to shut down.... done server stopped

    • Exit the debug session.

    • Scale the deployment to 1 pod. Postgres should start normally now.

Solution found after long fight on: https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html Credits go to author: Wade Barnes