Linux stretch cluster: MD replication, DRBD or Veritas?

Solution 1:

I'd go with GlusterFS. The latest version 3.x supports geo-replication (long latent pipe type of thing) as well as LAN replication. There's plenty of docs about how to replicate and spread data across the cluster.

I don't like DRDB, because there's a limit on the number of nodes you can use. I think GlusterFS on decent hardware, with a decent bit of network tuning might be just what you're after. Definitely worth a test session.

Solution 2:

I am currently testing "stretch cluster" using Red Hat Cluster Suite and DRBD. I am typing this at a hotel near Red Hat Summit in Boston which just ended. I talked with the Red Hat CLuster Suite developers and they said stretch clusters were not supported at this time.

This won't stop me from working on it for fun though. My set up is four HP blades in a single cluster. Two blades are in one datacenter about 15 miles from the other datacenter which houses the other two blades. In order to get the cluster to even join together, I needed the network team to configure the routers between the sites to pass multicast traffic. In addition, since Red Hat hard codes a TTL of "1" to the multicast heartbeat packets, I had to use iptables to mangle that multicast address to a higher TTL.

After that was done, I was able to get a four node cluster with my blades. For storage, I have a 3Par LUN shared at each site between each of it's two local nodes. These are the block devices I use for my DRBD devices. I should add here that I have a dedicated 1G WAN link for just my DRBD traffic. I was able to get DRBD running fairly easily between the sites and use that DRBD device as a PV in a clustered LV with GFS2 running on it. I do occasionally have split-brain conditions on my DRBD setup that I must manually recover from and I am trying to isolate that problem.

The next step has been the hardest. I want to be able to fail over my GFS2 mount to the other node in case the primary fails. My GFS2 service consists of a floating IP -> DRBD -> LVM -> GFS2. The drbd.sh script that comes in the source code for clustering doesn't work at all so I have been testing with the regular DRBD startup script in /etc/init.d. Seems to work "sometimes" so I will need to tweak that it seems.

I ws dismayed to discover that none of this is supported in Red Hat Cluster Suite, so any dream I had of moving this to production is dashed. And where else would you need this kind of set up? Pretty much only very important production stuff.

I did talk with Symantec here and they told me they absolutely support active-active stretch clusters with shared storage. I will believe that when I actually see it though.

Solution 3:

DRBD is dead slow as everybody knows. You can't use that for high load enterprise purposes. It uses 128 KiB hashing functions which limit the IO requests to max. 128 KiB instead of 512 KiB what a regular HDD can provide. Furthermore, there is a stupid IO request size detection. This thing only works when connected to the other host. If you loose the connection this is reset to 4 KiB on your local HDDs. 8.4.1 and 8.3.11 have the same issues.

Here are some more details: http://www.gossamer-threads.com/lists/drbd/users/24104

This is why real enterprises use $$$ stuff like Veritas.

MD RAID 1 is much better if you need performance at a low price. It also provides a "write-mostly" mode so that you can avoid reading from a slow device.

Solution 4:

If you've got a SAN backend then a shared storage filesystem (GFS?) makes a lot more sense than replicated storage.