Geographically distributed file system with preferred locality

Shame about the Linux requirement. This is exactly what Windows DFS does. Since 2003 R2, it does it on a block-level basis, too.


Some questions:

  • How many "server" nodes are you thinking about having participate in this thing?

  • What's the WAN connectivity topology like-- hub and spoke, full mesh? How reliable is it?

  • Do you expect clients to failover to a geographically non-local server in the event the local server fails?

Windows DFS-R certainly would what you're looking for, albeit for some potentially hefty licensing costs.

You say that collisions aren't a problem and you don't need a distributed lock manager, so you could do this with userland tools like rsync or Unison and just export the resulting corpus of files with NFS to the local clients. It's ugly, and you'd have to handle knocking together some kind of system to handle generating a replication topology and actually running the userland tools, but it would certainly be cheap as licensing cost goes.


Have you considered AFS?

The Andrew File System (AFS) is a distributed networked file system which uses a set of trusted servers to present a homogeneous, location-transparent file name space to all the client workstations.

As I understand it, most of the recent development has been behind the OpenAFS project.

I can't pretend to be familiar enough with the project to know if the "preferred locality" feature is available, but otherwise it sounds like a good fit.


Have you looked at OST pools in Lustre?

It won't be automatic but with OST pools you can assign directories/files to specific OST/OSSes - basically policy based storage allocation, rather than the default round-robin/striping across OSTs.

So you could setup a directory per site and assign that directory to the local OSTs for that site, which will direct all I/O to the local OSTs. It will still be a global namespace.

There's a lot of work going into improving Lustre over WAN connections (local caching servers and things like that) but it's all still under heavy development AFAIK.