corosync/pacemaker/fencing - passive/active cluster with 2 nodes

I'm configuring a cluster 2 nodes with pacemaker/corosync, and I have some question about it (and maybe best practice : i'm far to be specialist)

**OS:** redhat 7.6

I configurated the cluster with those properties

 - **stonith-enabled:** true

 - **symmetric-cluster:** true (even if is default value i think)


and added in corosync.conf

 - **wait_for_all:** 0 (i want a Node be able to start/work even if his twin is KO)

 - **two_nodes:** 1


Considering the fencing:

- Using ILO of blade HP (ILO1 for Node1, ILO2 for Node2)

I read that it was sometimes a good practice to prevent a node suicide, so added constraints 

- ILO1-fence can't locate in node1 

- ILO2-fence can't locate on node2

The problems I have is the following, happening at starting Node2 when Node1 is shutdown :

pacemaker/corosync can't start ILO2-fence on Node1 (of course cause Node 1 is down), and so don't start the other resources, and so my cluster is all not working >:[

I am wondering if I miss something in my configuration, or if I don't understand well how such a cluster should work.

Because I'd expect Node2 to start, cluster sees Node1 is KO and just start the resources to make Node2 works on its own.

But is true, since ILO2-fence can only be located on Node1 (because of constraint to avoid suicide), then this resource will always fails ... (when trying without those "anti-suicide" constraints, if Node2 has some services failure, then it shutdowns directly after start, which i don't want)

I would apreciate some returns and enlightments :)

Thank you :)

You have, let's say, 4 votes in your cluster - 2 nodes and 2 ILO-fence. Cluster can run, if >2 (3) are accesible. ILO2 is configured with only node1, so if node1 is down - the qourum is lost. Using ILO-fencing is not recommended:

"A common mistake people make when choosing a STONITH device is to use a remote power switch (such as many on-board IPMI controllers) that shares power with the node it controls. If the power fails in such a case, the cluster cannot be sure whether the node is really offline, or active and suffering from a network fault, so the cluster will stop all resources to avoid a possible split-brain situation." link

You have 2 options for 2 node cluster:

Use one external fencing device (witness node, voting VSA or SMB2/3 file share).
Use solution developed for 2-node clusters (like Microsoft Hyper-V Storage Spaces Direct (S2D) a.k.a. Azure Stack HCI, VMware Virtual SAN (vSAN) esp. ROBO Edition, or StarWind Virtual SAN (VSAN)) out of box.

corosync/pacemaker/fencing - passive/active cluster with 2 nodes

Related

Recent Posts