Do Pacemaker, Heartbeat, etc. make sense for EC2?

Does EC2 monitor the health of services inside the guests?

If not, and that is something you want, then Pacemaker would be relevant here. Corosync probably isn't an option yet as it only does mcast and bcast, so it would be a pacemaker+heartbeat scenario.

Here's a guide to how people do it with linode instances, much of it is likely to also be relevant on EC2: http://library.linode.com/linux-ha/

To answer the question of what the pieces are, Pacemaker is the thing that starts and stops services and contains logic for ensuring both that they're running, and that they're running in only one location (to avoid data corruption).

But it can't do that without the ability to talk to itself on the other node(s), which is where heartbeat and/or corosync come in.

Think of heartbeat and corosync as a bus that any node can throw messages on and know that they'll be received by all its peers. The bus also ensures that everyone agrees who is (and is not) connected to the bus and tells you when that list changes.

For two nodes Pacemaker could just as easily use sockets, but beyond that the complexity grows quite rapidly and is very hard to get right - so it really makes sense to use existing components that have proven to be reliable.


My gut level instinct is to say no, those are really not the right tools for cluster management on EC2. I've used them on stand alone hardware and found you have to have a very specific set of needs / failure cases for them to really make sense there. I cannot concoct a use case in my head that would demand those tools over more specific cloud monitoring systems and tooling like messaging developed with the platform in mind.

That said I don't consider my answer authoritative here, I am really hoping somebody chimes in with a little more experience with that tool set in the ec2 cloud.