Why are GlusterFS replicated volumes not recommended for Hosts in different datacenters?

This issue applies to many types of data stores not just GlusterFS. This is because increased distance increases latency. The recommendation to be on the same subnet is also to reduce latency due to network hops.

In order to maintain data synchronization, the various servers must ensure that all servers have the same view of the data. For data reads, the latency effect is usually not an issue. However, serious data corruption can occur if multiple servers write the same block before they are synchronized. When a data block is being updated it is possible to loose changes, if the block being updated was read before a subsequent update on a different server data will likely be lost.

Locking mechanisms can be used to reduce the risk of corruption. However, distributed locks take longer to obtain and release as latency increases. In this case, latency it the time to complete a round-trip between servers. There are three contributing factors when communicating between data centers.

Mail data stores tend to be relatively read mostly. Normally, it is unlikely that multiple clients attached to different servers would be updating the same file or directory. There may be some contention between incoming email messages and clients reading them, but the latency should not be a significant issue. Maildir format stores should have relatively lower contention that other formats. However, they have relatively high rename and move activity which may cause issues if your nodes become disconnected.

Distance: Wire data travels over wire at about 30 cm in a nanosecond, 300 meters in a a microsecond, or 300 kilometers in a millisecond. This adds significant latency as distance increases.
Switching time: Each switch a packet passes through need to be examine, route, queue and transmit the packet. This adds additional latency which increases as the switch gets busier.
Network congestion: Networks can get congested causing additional delays as traffic is queued longer and possibly re-routed. If congestion is bad, the delays may be long enough to trigger packet re-transmission.

Why are GlusterFS replicated volumes not recommended for Hosts in different datacenters?

Related

Recent Posts