We constantly bat this around the office, and the question continues to come up. How do you deal with PostgreSQL replication? I'm not even necessarily talking about advanced clusters, just keeping it simple with Master-Slave, Master-MultiSlave, and Master-Master. I find that setting it up for MySQL is typically pretty simple. Failover is straightforward if not perfect, especially for how easy it is to configure. We've played with Slony, but it's a bit too hands on (schema changes require intervention, new databases require intervention, etc). PGPool2 was pretty nice, until a node went down and we couldn't find a graceful way (other than bringing everything down and reseeding the fallen node) to get replication back in sync. Basically here's what I'm typically looking for:

  • Easy setup (I'll settle for difficult setup, but easy to expand)
  • Simplistic failover
  • Bringing a fallen node back in just requires time (i.e. like mysql. Server goes down, you bring it up, and wait for replication to catch up)
  • Schema changes don't break replication
  • Adding a new database to the server is seamless (i.e. like mysql, you can replicate a whole DB server, so a new database is created on the master, it automatically propagates to the slave)

MySQL handles most of these fairly well, but I hold a certain fondness for PostgreSQL. Besides, we have some situations where it's our only option, and we'd like to add replication to the mix. What are you using currently, and how do you feel about your solution? This isn't a MySQL versus PostgreSQL post, I promise, because that's not what I'm trying to start. :)


Solution 1:

Short answer - there's no such solution yet for PostgreSQL if you need online readonly slaves.

There're two major development projects currently going on in this area which are included in PostgreSQL 9.0 (Spring/Summer 2010), namely:

  • Synchronous Replication:

http://wiki.postgresql.org/wiki/NTT's_Development_Projects

  • Read only hot standby slaves:

http://wiki.postgresql.org/wiki/Hot_Standby

which in combination aim to achieve the ease of use of MySQL-style replication minus the bugs/issues MySQL has plus the reliability users know from PostgreSQL.

All of this was kicked off by a manifest from the PostgreSQL Core Team in 2008:

http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php

The PostgreSQL replication solutions to this day with the largest user base are Slony-I (more expensive for writes, makes schema changes fiddly), WAL shipping/walmgr (Slaves can't be used online) and pgQ/londiste from Skype/Skytools (more tools/building blocks than a finished solution).

I've written a few things on Log Shipping, walmgr and Slony-I, see

http://blogs.amd.co.at/mt/mt-search.cgi?blog_id=1&tag=pgrep&limit=20 for more information.

Solution 2:

And to throw another solution into the ring: rubyrep.

To compare with your requirements:

  • Easy setup
    Yes, that's actually the primary focus of rubyrep.
  • Simplistic failover
    Yes. In fact rubyrep does master-master replication - to fail over, no action is necessary at all. Just start using the other database.
  • Schema changes don't break replication
    Yes.
    For non-primary key changes replication doesn't even have to stop (but make sure the schema is changes on both sides at the same time)
    To add / remove tables, simply restart the replication daemon. Only changing the primary key column of a table takes a bit of effort.
  • Adding a new database to the server is seamless (i.e. like mysql, you can replicate a whole DB server, so a new database is created on the master, it automatically propagates to the slave)
    This is only supported in a limited way: each rubyrep setup replicates only one database at a time. (But it is very easy to set up replication for more than one database.)

Solution 3:

You didn't mention having a hot read-slave as a requirement, so I'm going to propose using Heartbeat with either shared storage or DRBD. It just does the right thing and administration is a breeze. It's the Linux equivalent of older Microsoft SQL Server clustering. One node is active and the other node is passive while the data is shared between the two. You don't have to worry about SQL-based replication because it's all handled lower down at the block level.

Seriously, it's by far the best solution if you don't need read slaves. The WAL archive stuff was hokey at best and you must set everything up again if you ever disrupt the shipping process for a server reboot. slony and londiste don't cut the mustard. If you want to stay on the main source tree and not go commercial, Heartbeat is your best bet.

Solution 4:

From your requirements it seems that PITR is easiest way to solve your problem:

On-line backup and point-in-time recovery (PITR)

You didn't say that you need to query slave server, so PITR might be just right.

It's standard part of PostgreSQL from version 8.0 so you probably already have everything needed to get it up and running.

If you find instructions too verbose, take a look at SkyTools WalMgr which will make process of creating/failover to hot-standby data single command task.

For more complex replication scenarios, I had good experience Slony-1, but PostgreSQL has many good replication/HA options available.