What are some shared-disk filesystems people have successfully used with iSCSI?

The setup looks something like this.

iSCSI setup

The goal is to have multiple computers mount a single LUN from the iSCSI target. Ideally mounting read/write and with ACL support.

The servers are running GNU/Linux, so preferably a filesystem available in the vanilla kernel. Though I can deal with compiling third-party modules if necessary.

I'm currently looking into GFS2 and OCFS2. What are some successful deployments people have done like this? Any gotcha's I should look out for?


Solution 1:

So long as the iSCSI stack you use has very strong SCSI3 reservation support, you should be good. The implementation details of the filesystem shouldn't matter. The biggest concern is generally the quality of the iSCSI target software, since bugs in that will lead to faults in the filesystem and overall faulting of the clustering. You want something that fails cleanly in those cases, and all of the generally acceptable clustered filesystems do that.

Solution 2:

I have successfully tested GFS on Red Hat Cluster Suite in LAB. The fencing device use SNMP IFMIB to shutdown network port, you can also add quorum disk for extra security. You can find some tech notes here

http://honglus.blogspot.com.au/2011/05/passed-25-rhca-ex436-clustering-and.html

GFS depends on Red Hat Cluster Suite, which is not easy to implement. If you don't need really cluster feature, just need to write to a shared block device concurrently, check IBM GPFS, it much easier to implement and the license fee is only few hundreds bucks.

Solution 3:

I have 8 web / app servers mounting multiple OCFS2 filesystems. The common stuff, Apache configs, logs, vhosts, etc and app code are stored there. I wasn't around for the initial implementation, so I can't speak much to that, but from what I remember off the top of my head, it's pretty standard with some modifications.

As for gotchas, it can be somewhat fragile. By this I mean, we had many instances where when a server left and rejoined the cluster, the load on all the servers would shoot through the roof for a few minutes. I think this was a bug though, IIRC after some updates the behavior is resolved.

The cluster configuration file must be maintained in the same state on each node, which can be a pain.

For us, updates are going to be a problem. We are on 1.4 I think, and an update to 1.6 means downtime for the entire cluster since the versions aren't compatible. You can't mix the two versions. Maybe on a anew deployment this isn't such a big deal.

Also, keep in mind a cluster fs is going to incur more overhead than a conventional fs. The more nodes you add, the larger the problem. The solution is more hardware (and good tuning).