Recommendations: configuring a 10GbE NAS stack for virtualisation storage
I'll try as hard as I can to word this so it is not considered a shopping list.
We have been successfully running a dev/test ESXi environment for some time, with a couple of Dell PE2950III servers over an HP MSA2012fc Starter Kit (with the Brocade-based HP Class B SAN switch). This has worked very well for us, but being in dev/test, it comes with various caveats with regards to uptime/performance.
In any case, the perceived success of the dev/test platform has led to calls for a more 'production-ready' virtualisation platform. We are drafting the recommendations at the moment.
However, one of the complaints levelled at the existing stack is a lack of support for other virtualisation technologies (HyperV, Xen, etc), as the SAN LUNs are fully-allocated and formatted as VMFS. This is something that we have been told to overcome but, as is typical, there is no indication of the uptake of HyperV/Xen (and we don't particularly want to waste the 'expensive' storage resource by allocating LUNs to such where it won't be used).
As such, our current line of thinking is to forego the traditional fibre SAN, in favour of a straight-forward CentOS box (probably higher-end HP ProLiant DL380p Gen8), running NFS and Samba/CIFS daemons, with a 10GbE switch (probably Cisco Nexus 5000/5500-series).
The reasoning is and that the ESXi heads could talk NFS and the HyperV heads could talk CIFS, but both ultimately be pointing to the same XFS/RAID1+0 volumes.
Now, I'm not green enough to think that 10GbE is going to allow me to get true 10 gigabits of I/O throughput between the heads and the disks, but I don't know the kinds of overheads I can expect to see from the NFS and CIFS implementations (and any other bits that might interfere when more than one host tries to talk to it).
I am hoping to at least get near to the sustained disk read/write speeds of direct-attached disks, though, for as many hosts as I can. Looking at various drive manufacturer websites, I'm roughly anticipating this to be somewhere around the 140-160MB/s mark (if I am way off, please let me know).
What recommendations/guidelines/further reading can anyone offer with regards to Linux/NFS/Samba or 10GbE switch configuration that might help attain this?
I understand the desire to move away from pure block storage to something more flexible.
However, I would avoid using a straight-up Linux storage stack for this when several storage appliance software offerings are available right now. A Linux approach could work, but the lack of management features/support, the XFS tuning needed (here and here) and the fact that it's not a purpose-built storage OS are downsides.
Add to that, some nagging issues with the XFS/RHEL code maintainer and a nasty kernel bug that's impacting system load average, and the Linux combination you describe becomes less-appealing.
A pure Linux could be made to work well for this purpose, but the setup would certainly be outside of the norm and may use esoteric solutions like ZFS on Linux or the not-so-ready-for-primetime Btrfs. More details on those later.
I do this often, opting to go with NFS on ZFS-based storage for most of my VMware deployments versus an entry-level SAN like the HP P2000 array. I augment the ZFS installation with L2ARC (read) and ZIL (write) SSD and DRAM cache devices. In addition, I've been using 10GbE with this type of setup for four years.
I'll focus on NexentaStor for the moment, as that's the appliance software I use most of the time...
I've build numerous HP ProLiant-based systems for ZFS storage, from all-in-one VMware hosts to standalone DL380 storage "appliances" to full-on multi-path SAS connections to cascaded storage JBODs units (front and rear).
NexentaStor and NFS/CIFS.
Nexenta supports the presentation of file AND block storage to external systems. I can take a pool of 24 disks and provide iSCSI storage to hosts that need native block storage, NFS to my VMware ESXi infrastructure and CIFS to a handful of Windows clients. The space is used efficiently and is carved out of the pool's storage. E.g. no artificial caps. Compression is transparent and helps tremendously in VM scenarios (less to move over the wire).
10GbE helps but it depends on what you're presenting to your virtualization hosts. Will they be 1GbE or 10GbE as well?
Benchmarks:
I'll run a quick test of a guest virtual machine running on an ESXi host connected via 10GbE to a NexentaStor SAN.
This is going to a 6-disk array. (in an HP D2600 enclosure - 600GB 15k SAS)
[root@Test_VM /data]# iozone -t1 -i0 -i1 -i2 -r1m -s6g
Iozone: Performance Test of File I/O
Run began: Mon Feb 11 18:25:14 2013
Record Size 1024 KB
File size set to 6291456 KB
Command line used: iozone -t1 -i0 -i1 -i2 -r1m -s6g
Output is in Kbytes/sec
Children see throughput for 1 initial writers = 128225.65 KB/sec
Children see throughput for 1 readers = 343696.31 KB/sec
Children see throughput for 1 random readers = 239020.91 KB/sec
Children see throughput for 1 random writers = 160520.39 KB/sec
This is going to a busy 16-disk array (in an HP D2700 enclosure - 300GB 10k SAS).
[root@Test_VM2 /data]# iozone -t1 -i0 -i1 -i2 -r1m -s4g
Iozone: Performance Test of File I/O
Run began: Mon Feb 11 16:33:53 2013
Record Size 1024 KB
File size set to 4194304 KB
Command line used: iozone -t1 -i0 -i1 -i2 -r1m -s4g
Output is in Kbytes/sec
Children see throughput for 1 initial writers = 172846.52 KB/sec
Children see throughput for 1 readers = 366484.00 KB/sec
Children see throughput for 1 random readers = 261205.91 KB/sec
Children see throughput for 1 random writers = 152305.39 KB/sec
The I/O graphs from the same run... Kilobytes/second and IOPS measures.
Using a Linux host providing CIFS storage for Hyper-V hosts is not reasonable, and definitely not supported by Microsoft. When you're talking something as important as virtualization for business critical infrastructure, you definitely want to have vendor support.
You will either need to provide more traditional iSCSI or Fibre Channel storage to your Hyper-V servers, or if you plan on running Windows 2012 you could use Windows 2012 storage services to provide iSCSI to your hosts.
Another possibility is running Windows 2012 or something like Nexenta as a virtual guest in your VMWare infrasucture to provide iSCSI for your Hyper-V guests. It's not the most performant configuration, but its also not bad. Since your Hyper-V footprint is small to nonexistent this could be a good compromise for maximum flexibility without dedicated a LUN.
Otherwise you'll need to go with something that completely virtualized your LUNs like an HP LeftHand SAN. With LeftHand, disks are not dedicated to a LUN. Instead all LUNs are striped across all disks. It sounds a bit strange but its a good product.