Can someone explain this GlusterFS setup?

After digging around to understand how to setup replication using gluster I've come across this question: Can Apache Read The GlusterFS Brick Directly But Write To The GlusterFS Mount?

I've also found a how that seems to explain the same thing, and I thought I understand it, but now I think I don't.

So in order to get this kinf of replication I need to have both machines function as servers and clients at once? Now I don't understand how the relationship works: isn't B, for example a client of A?

Are there more than one level of client-server relationships involved? Is A a client for A and B a client for B, each mounting in a folder a volume from the same machine and those 2 volumes somehow are in sync (from A to B) in a 3rd layer of relationships?

Why is the question above asking about writing to the file-system or to the mounted volume? When I make B a client to A, with A exporting a folder and B mounting it as a remote volume in a folder I never asked my self what I was writing on: i wrote into the original folder on A and into the mounted volume on B. Isn't this how it's supposed to work?


Solution 1:

Let's say you have two machines, A and B. On each machine, you export /opt/files as a Gluster brick, and set up client-side replication. We then mount the resulting directory as /mnt/gluster-files on both machines. This is important!

Using that mount point, we now have a highly available file system across the two machines.

When you write a file - let's say /mnt/gluster-files/example on machine A, it will cause two things to happen:

  1. Write a copy to /opt/files
  2. Send a copy over the network to be written to /opt/files on machine B.

This is good, because we want to have redundancy, which means we have to have more than one copy of the data.

Next up, let's say we want to read the same file. Again on machine A:

  1. You issue a read for /mnt/gluster-files/example
  2. GlusterFS says "I need to check all the replica nodes to find out who has the most recent version of this file"
  3. GlusterFS checks every node
  4. It turns out that all copies are the same, because replication is working nicely
  5. You are returned the file from your local disk. §

(§ There is a read-subvolume client option, and it is sensible to set it to the local volume on any machine that is a Gluster client and server, as in this case. Otherwise, step 5 could be 'you are sent the file from a random node'.)


Behind the scenes, GlusterFS keeps /opt/files on both machines in sync. Checking every node, especially for a large number of small files, adds a not-insignificant performance penalty.

The question is therefore raised: if I am running a process on one of these two machines, and I know the files are in sync, why can't I just read the files from the local share?

It's not recommended, but you can do this. Read the files from /opt/files. Manually keep track of if you get out sync, and if you do, do something like a ls -laR in /mnt/gluster-files which will trigger a synchronization.

So, what happens if you write to /opt/files on machine A?

The file sits there unnoticed by GlusterFS. Gluster doesn't work that way. It doesn't get onto machine B unless you happen to do something which makes Gluster notice it on machine A.

Therefore, you can't just tell Apache to read and write to /opt/files. What seems like a good compromise is telling it to read from /opt/files but write to /mnt/gluster-files. This is only possible if your application lets you specify a different path for reading and writing files, which not many do.