GlusterFS split brain with no path, what does it mean?
What is Split-Brain?
As mentioned in the Official Documentation on Managing Split-Brain provided by RedHat, split-brain is a state when a data or availability inconsistencies originating from the maintenance of two separate data sets with overlap in scope, either because of servers in a network design, or a failure condition based on servers not communicating and synchronizing their data to each other. And it is a term applicable to replicate configuration.
Pay attention that it is said "a failure condition based on servers not communicating and synchronizing their data to each other" - due to any likelihood - but it doesn't mean that your nodes might lose the connection. The Peer may be yet in cluster and connected.
Split-Brain Types :
We have three different types of split-brain, and as far as I can see yours is entry split-brain. To explain three types of split-brain :
Data split-brain : Contents of the file under split-brain are different in different replica pairs and automatic healing is not possible.
Metadata split-brain :, The metadata of the files (example, user defined extended attribute) are different and automatic healing is not possible.
Entry split-brain : It happens when a file have different gfids on each of the replica pair.
What is GFID ?
GlusterFS internal file identifier (GFID) is a uuid that is unique to each file across the entire cluster. This is analogous to inode number in a normal filesystem. The GFID of a file is stored in its xattr named trusted.gfid
. To find the path from GFID, I highly recommend you read this official article provided by GlusterFS.
How to resolve entry split-brain?
There are multiple methods to prevent split-brain from occurring but to resolve it, the corresponding gfid-link files must be removed. The gfid-link files are present in the .glusterfs directory in the top-level directory of the brick. By the way, beware that before deleting the gfid-links, you must ensure that there are no hard links to the files present on that brick. If hard-links exist, you must delete them either. Then you can use self-healing process by running the following commands.
In the meantime, to view the list of files on a volume that are in a split-brain state you can use:
# gluster volume heal VOLNAME info split-brain
You should also beware that for replicated volumes, when a brick goes offline and comes back online, self-healing is required to resync all the replicas.
To check the healing status of volumes and files you can use:
# gluster volume heal VOLNAME info
Since you are using version 3.5, you don't have auto healing. So after doing the steps mentioned earlier, You need to trigger self-healing. To do so:
-
Only on the files which require healing:
# gluster volume heal VOLNAME
-
On all the files:
# gluster volume heal VOLNAME full
I hope this will help you through fixing your problem. Please read the official docs for further information. Cheers.
split-brain occurs when two nodes of a cluster are disconnected. Each node thinks the other one is not working.
To fix it, you have to understand why your two nodes are not talking to each other anymore.
I think document is clear pretty enough, it even gave you a similar example.
And for Gluesterfs's healing commands such as
gluster volume heal **VOLNAME** split-brain latest-mtime **FILE**
FILE can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file
So you don't need worries about that.
And as convert GFID to path says:
GlusterFS internal file identifier (GFID) is a uuid that is unique to each file across the entire cluster.
this script may tell you which filename belongs to which gfid, but brain split happened, it may not have a filename.
You're running 3.5 and don't have a semi-auto heal cmd, so you may need to fix confliction yourself manually which normally means decide which gfid file need to be deleted.
How do I fix it?
Split-brain resolution can be found either here. In case is would not help much, the manual how-to here should do the job. For the case, I see the article also helpful.
How to avoid Split-brain.
Protection against network partitions is done through a quorum voting algorithm. In case a host fails, or there is a split-brain scenario where the nodes continue to run but can no longer communicate to each other, the remaining node or nodes in the cluster race to place a SCSI reservation on the witness drive. In the case of a split-brain, the witness will help to decide which of the hosts who is holding a copy of the data should take over control.
Some examples.
VMware VSAN allows running 2-node cluster with the witness drive running on a 3rd host or in the cloud. Source
StarWind Virtual SAN runs in just 2-node setup using Microsoft Failover Cluster service, that also contains quorum voting mechanism to avoid the split-brain issue. Source
For both, Heartbeat network is used to serve/monitoring the communication between nodes and quorum. In order to avoid a split-brain, I see it’s mandatory to go with redundant Heartbeat channels.