Is there a Distributed SAN/Storage System out there? [closed]

Like many other places, we ask our users not to save files to their local machines. Instead, we encourage that they be put on a file server so that others (with appropriate permissions) can use them and that the files are backed up properly.

The result of this is that most users have large hard drives that are sitting mainly empty. It's 2010 now. Surely there is a system out there that lets you turn that empty space into a virtual SAN or document library?

What I envision is a client program that is pushed out to users' PCs that coordinates with a central server. The server looks to users just like a normal file server, but instead of keeping entire file contents it merely keeps a record of where those files can be found among various user PCs. It then coordinates with the right clients to serve up file requests. The client software would be able to respond to such requests directly, as well as be smart enough to cache recent files locally. For redundancy the server could make sure files are copied to multiple PCs, perhaps allowing you to define groups in different locations so that an instance of the entire repository lives in each group to protect against a disaster in one building taking down everything else.

Obviously you wouldn't point your database server here, but for simpler things I see several advantages:

  • Files can often be transferred from a nearer (or the local) machine.
  • Distribute network connectivity, rather than crowding all file transfers on a single connection
  • Disk space grows automatically as your company does.
  • Should ultimately be cheaper, as you don't need to keep a separate set of disks

I can see a few downsides as well:

  • Occasional degradation of user pc performance, if the machine has to serve or accept a large file transfer during a busy period.
  • Writes have to be propogated around the network several times (though I suspect this isn't really much of a problem, as reading happens in most places more than writing)
  • Still need a way to send a complete copy of the data offsite occasionally, and this would make it very hard to do differentials

Think of this like a cloud storage system that lives entirely within your corporate LAN and makes use of your existing user equipment.

Our old main file server is due for retirement in about 2 years, and I'm looking into replacing it with a small SAN. Our current file server is using about 400GB of a 1TB share. We've only kept it down that small because our backup space was limited. I'm looking to expand to at least 4TB of usable space when it's replaced, maybe much more if prices come down as much I expect. I'm thinking something like this would be a better fit. As a school, we have a couple computer labs I can leave running that would be perfect for adding a little extra redundancy to such a system.

With very few exceptions, our users are filling less than 40GB of their 120GB hard drives, meaning I could easily reserve 65GB per machine. And that's only going to increase, as newer machines are coming in with 250Gb drives and even those could easily be larger soon. By the time the file server is replaced, given our desktop replacement schedule I'd expect such a system to allow for 5TB of usable storage, even allowing for redundancy and history.

Unfortunately, the closest thing I can find is Dienst, and it's just a paper that dates back to 1994. Am I just using the wrong buzzwords in my searches, or does this really not exist? If not, is there a big downside that I'm missing?


Solution 1:

It sounds to me like you're describing AFS, the most common implementation of which is OpenAFS. The key OpenAFS concepts are described here: http://docs.openafs.org/UserGuide/ch01.html#HDRWQ3.

AFS is:

  • Distributed. Filesystem multiple machines, but still using a unified namespace so the distributed nature is transparent to the client machine.
  • Redundant. Files can exist on multiple server nodes at once so the loss of several server nodes does not result in inaccessibility of any data.
  • Scalable. Apparently some "Enterprise" implementations span as many as 25,000 nodes.

Solution 2:

Yeah, the large disks in end-user desktop systems are tragically going un-used when you're properly using centralized storage. Oh well. Some downsides of using a hypothetical desktop-network-distributed NAS:

  1. It would have to handle degradation caused by user machines going off-line. Someone didn't come in today and their machine is off? Better hope that the documents on there are distributed onto machine(s) that are turned on. Someone is working late tonight and their machine is the only one that's on? Tough luck, sorry. Unless you also have everything copied to a real fileserver - and then, what did you gain?

  2. Everything would have to have good encryption - otherwise, the boss's documents that contain his plan to cash out, or the HR doc that shows everyone's salary, are replicated to Jimmy the mail-clerk's machine. On which he runs LimeWire. See where this is going?

Solution 3:

Something like CleverSafe (has both open source and commercial versions) can mostly do what you want, but managing very unreliable nodes might be a problem. CleverSafe handles multiple node outages, but perhaps not quickly enough for the sort of "constant churn" of nodes you would see using desktops as the storage nodes.

I think there are similar solutions from academic papers I've read in the past, but CleverSafe seems to be a real working product and not just a prototype. The company has been around since 2004.

Solution 4:

SANsymphony 7.0 Storage Virtualization Software

below is all quoted from their website:

Main Features

Device-independent virtual disk pooling, synchronous mirroring (HA), high-speed caching, asynchronous remote replication, thin provisioning, auto-tiering, online snapshots, non-disruptive disk migration, continuous data protection (CDP)

Access Type

Block disk I/O over a physical or virtual SAN. File system access is provided via NFS/CIFS protocols from the underlying Windows Server operating system. The two access methods may be combined to meet high availability, unified storage (SAN/NAS) requirements.

Host Environments Supported

Computer systems running standard Windows operating systems including (Windows Server 2000, 2003, 2008, Hyper-V, Windows XP, Windows 7), UNIX, HP-UX, Sun Solaris, IBM AIX, RedHat Linus, Suse Linux, Apple MacOs, VMware ESX / vSphere, Citrix XenServer,

Disks Supported (back-end)

Any internal drives, external drives, external disk arrays, JBODs, Solid State Disks (SSD), and intelligent storage system supported on Windows Server 2008 may be attached to the DataCore node(s). They may be direct-attached or SAN-connected.


It's what you're after, yes?