Update Postgres on Production Server without downtime

I have a production server running Postgres 9.4. The database is > 10 GB. Is it possible to upgrade to Postgres 9.5 without downtime and without loosing data?

Upgrade tutorials recommend to stop Postgres while executing sudo pg_upgradecluster 9.4 main, but this may take a long time. Updating a 10 GB cluster can take several hours!

I also tried pg_dump mydb > db.sql. Dropping the database and inserting the dump again in PG 9.4 (psql -d mydb -f db.sql) took about 50 minutes.

But inserting the dump in PG 9.5 finished only after mor than 7 hours. Especially creating the index was really slow...

2016-07-18 00:13:55 CEST [60358-5] ERROR:  canceling autovacuum task
2016-07-18 00:13:55 CEST [60358-6] CONTEXT:  automatic analyze of table ...
2016-07-18 00:36:20 CEST [60366-1] ERROR:  canceling autovacuum task
2016-07-18 00:36:20 CEST [60366-2] CONTEXT:  automatic analyze of table ...
2016-07-18 04:21:40 CEST [60361-1] ERROR:  canceling autovacuum task
2016-07-18 04:21:40 CEST [60361-2] CONTEXT:  automatic analyze of table ...
2016-07-18 07:55:19 CEST [61316-1] ERROR:  canceling autovacuum task
2016-07-18 07:55:19 CEST [61316-2] CONTEXT:  automatic analyze of table ...

So neither pg_upgradecluster nor pg_dump is an acceptable solution. Even with PG 4 you would have a downtime of at least 50 minutes. Therefore: How can databases be upgraded on production servers or big master-slave-clusters without downtime and dataloss?


Solution 1:

No downtime at all is not possible without some clustering magic.

Some other possibilities:

  1. use pg_upgrade with the --link option. With this option, the original DB files are not copied, rather they are hard-linked to the new directory, greatly speeding up the process. Please pay attention that this will permanently alter the source DB files.
  2. use pg_dump and restore on the new database. You can greatly shorten the needed time by disabling synchronous writes in the new database (fsync = false in the new PG instance's config file)
  3. side-install a new PG instance and let it run on a different port. Then, use pg_dump to load, via network, the dump to the new instance. When done, swap the ports and use the new instance.

Solution 2:

I think you already solved it so maybe for others with similar problems.

After several years of working with postgresql from version 8.4 to newest 9.6 I would recommend for these cases - do not "upgrade". If it is possible create new machine or new cloud instance with latest version of OS you use (very important - prevents many problems) and newest pg version and duplicate data.

How to duplicate data depends on your application, PostgreSQL version and surrounding environment. Database ~10 GB is not so big so it is good for you. I work with dbs >400 GB so imagine multiplicity of problems here...

  • Pg_dump 9.4 already allows you to dump into directory format with multiple jobs using multiple CPU cores which can lower dump time very significantly - unless you have everything in one big table :-)
  • Or for pg 9.4+ you can use pglogical extension as mentioned before, which is really great solution but be aware - to make pglogical running on master you must restart postgres because extension must be added into postgresql.conf file into shared_preload_libraries=’pglogical’ - see here: http://postgresql.freeideas.cz/pglogical-postgresql-9-6-small-hints-debian/ So I highly recommend to test it on some other instance with same version before! And to switch to new instance schedule some short maintenance window to switch clients over to the new database - if connect string is not hardcoded into application :-) - but in that case you can prepare pgbouncer on old machine with connections configured to new machine, stop old database, switch pg port (presume 5432) to pgbouncer and deal with connect strings later if possible...
  • Or maybe your application does not have so many new data coming and you can use latest backup and fork inserts/updates in your application to both machines? And switch clients when you are sure everything is working?

I have seen variants of all these scenarios in real life. So have fun! I cross fingers :-)

Solution 3:

Upgrade (almost) without downtime should be possible with pglogical. At least for PostgreSQL >= 9.4 and newer it should work.

It's relatively new project (2016) which is based on code from Bi-Directional Replication for PostgreSQL. For installation you'll need 2ndQuadrant repository.

Usage is described in README, a DB restart is needed (you have to update replication configuration) but at least it shouldn't cause several hours downtime.

Unlike repmgr, pglogical is meant for one-time DB replication, that will take much longer than copying binary WAL files.

Firstly enable extension for each DB which needs to be copied (upgraded):

CREATE EXTENSION pglogical;

Currently all commands needs to be executed as superuser (postgres). Start with creating a "master node" (provider):

SELECT pglogical.create_node(
    node_name := 'provider1',
    dsn := 'host=providerhost port=5432 dbname=db'
);

and mark schema(s) for replication:

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);

and sequences replication:

SELECT pglogical.replication_set_add_all_sequences('default', ARRAY['public']);

Note that each table requires a primary key, otherwise replication won't start. Unlike other backup methods in here table consistency matters.

Continue with "standby node" (subscriber)

SELECT pglogical.create_node(
    node_name := 'subscriber1',
    dsn := 'host=thishost port=5432 dbname=db'
);

and finally start replication:

SELECT pglogical.create_subscription(
    subscription_name := 'subscription1',
    provider_dsn := 'host=providerhost port=5432 dbname=db'
);