What is the use-case for NAS on a mid-to-high end storage array?
Many large storage vendors (EMC, NetApp, etc) provide storage equipment that does NAS in addition to FCP, FCoE, or iSCSI.
When would it be appropriate to use the NAS functionality of these boxes? In an SMB setting, a NAS is a cheaper replacement with lower management overhead for a file server, but why would someone that can afford a VNX not just map block-level storage to a file server and share it out that way? What's the advantage to using the NAS components of these devices directly?
Solution 1:
The advantage comes when you're supporting solutions that may not work well with block storage, or if cost of proper FC infrastructure is prohibitive.
Think of a large distributed application in a high-performance computing environment. Let's say 1,000 compute nodes. NFS may be ideal for application data because its per-port cost is low, it scales and is reliable. iSCSI would have extra overhead and management effort. Fibre-channel would require a dedicated infrastructure and would have a high per-port cost. Yet, the application could benefit from the IOPS, capacity or scale-out capabilities of a mid-tier or high-end array.
I use VMware with NFS about 80% of the time versus other protocols. Native thin-provisioning, visibility/transparency and no datastore size limitations are the advantages over presenting block storage to the same hosts. Performance differences are negligible these days with the right design. Sometimes, I may present block and NAS storage in the same environment (on separate networks). Flexibility matters.
Other examples include organizations who may want to leverage storage native to the protocols supported by their client machines, but use the SAN's snapshotting/replication or backup facilities directly. CIFS on the Windows side, NFS for Unix and I've recently seen Macintosh/AFP additions for NexentaStor storage.
Solution 2:
The first thing to think about is that if you run a file system and file sharing protocol on your storage device (making it a NAS), you won't have to run it on the server. That's a little bit of work avoided. If the file system will need to be shared among other servers and users, this might represent quite a bit of work avoided.
If the server using the data is the only server accessing it, it's probably best to stick with block level protocols (FC, SAS, iSCSI, or FCoE). The main reason for this is that while the management of the actual file system is fairly easy for a system, the protocol to access that file system over the network can be hairy and inefficient. CIFS is extremely inefficient, and NFS, while more efficient than CIFS, is still much less efficient than anything based on SCSI.
If the data will be shared among many servers, your only choice for block would be a cluster type environment, where all the nodes have access to the same SCSI volumes. That may not be a possibility. And even if it is (for VMWare, for example), there are often advantages to having the file system and file sharing protocol handled by something central that's not a server.
Specific workloads where it makes a lot of sense to use a NAS:
- VMWare: the VMWare file system you'll install on a SCSI volume on everything before the latest version of VMWare is not very well designed, and introduces all kinds of limitations to your environment. VMWare bent over backwards to make sure they support VMDKs being hosted on NFS, and that eliminates the need for VMFS for a lot of shops. That said, there might be some things you can't do with NFS that you can with VMFS. I'm not a VMWare expert.
- Hyper-V: Microsoft's hypervisor's newest beta has support for the NAS, and it seems like it will be a priority for them to ensure that it works well.
- File servers: users' home directories and network shares are a perfect application for a central NAS. It's one less computer in your domain, and you can typically consolidate many file servers into a single NAS.
Solution 3:
The advantage is that you do not add another layer. This has several advantages:
- The storage system attempts to optimize for the observed access pattern; if you add another system in the middle that introduces its own caches, the access pattern now changes to whatever the caching algorithm there produces, which may be more difficult to optimize for.
- A proper storage manager has a battery backed up write cache, allowing the journal to be kept in RAM only. If another system handles the file system layer, this writes journal blocks, then waits for the journal to be acknowledged, then writes the data blocks. Keeping the file system on the NAS allows skipping this step.
- The storage manager is optimized for high throughput. It is doubtful that a regular file server could ever saturate a 10 Gbps link, so this becomes a bottleneck.
A system I helped build for a customer has 6x10Gbps going to six different switches, of which five are distributing 1Gbps links to 30 machines each doing rendering work. These machines each drop off a 200 MB file every twenty seconds, which gives an average rate of 1,500 MB/s being written to storage (in reality, write load is a bit bursty and not entirely predictable as different scene complexity leads to varying rendering times).
In between, other machines are reading these files, concatenating them into a constant video stream that is then given to a hardware encoder, and the resulting stream is written back to the storage, into a separate partition.
Total I/O load on this server is higher than the memory bus speed on the fastest mainboard available back then, so the bottleneck was quite obvious. :)