PostgreSQL failover cluster on Windows Server

We are looking for advice on how to setup a basic failover cluster for our application:

  • We will be using 4 machines running Microsoft Windows Server (most probably 2003).
  • All four will always run our application, which is essentially a web service.
  • Load balancing is "outsourced" - somebody else handles the distribution of the web requests among the servers.
  • Only one of the servers will be running the PostgreSQL server actively at any given time. Another server (of the four) also has the DB installed, but is on standby/passive.
  • The DB data is stored on shared storage. No copying data between servers.
    • Reads are done very frequently by many end-users, and in rather small chunks of data.
    • Writes are done much less frequently, by less users, and in very large bulks of data.

Now, how can one configure Microsoft Cluster Service to keep only one instance of the DB server and 4 instances (1 per server) of our application at all times? And does PostgreSQL integrate neatly with MSCS at all?

Update: Instead of keeping the data on shared storage, I also consider using log shipping to replicate data on a couple of DB servers. There are two issues with this option:

  • Log shipping only makes sure that I have a second server that gets all of the data and is ready to take over. How do I implement the actual failure detection and failover switch?
  • Switching back: Suppose the master fails and the system automatically fails over to the slave, and later the master comes back online. I understand that with WAL shipping this will require to reconfigure the log shipping once again, and that switching back is far from seamless. Is that so?

PostgreSQL works fine with MSCS, using the generic service mode. It is recommended that you install the PostgreSQL binaries on the shared drive as well, and point the service there - that way you are sure you won't get a version mismatch.

Lars has a good point in that it's not recommended to run the db and appserver on the same machines. You might want to consider using 2 machines for the db and 2 for the appserver instead - or at least have MSCS take down the appserver on the node that PostgreSQL runs on (i'm not sure if it can do that, but I would imagine it should)