Choosing a SAN technology for 100s of VM Web Servers

The key to a good VMWare storage platform is understanding what kind of load VMWare generates.

  • First, since you host a lot of servers, the workload is typically random. There are many IO streams going at the same time, and not many of them can be successfully pre-cached.
  • Second, it's variable. During normal operations, you may see 70% random reads, however the instant you decide to move a VM to a new datastore or something, you'll see a massive 60GB sequential write. If you're not careful about architecture, this can cripple your storage's ability to handle normal IO.
  • Third, a small portion of your environment will usually generate a large portion of the storage workload.

The best way to approach building storage for a VMWare platform is to start with the fundamentals.

  • You need the ability to service a large random read workload, which means smaller faster drives, as well as possibly SSD. Most modern storage systems allow you to move data around automatically depending on how it's accessed. If you are going to use SSD, you want to ensure this is how you use it. It should be there as a way of gradually reducing hot-spots. Whether you use SSD or not, it's beneficial to be able to put all the work across all the drives, so something with a type of storage pooling would be beneficial.
  • You need the ability to service intermittent large writes, which doesn't care as much about the spindle speed of the underlying drives, but does care about the controller stack's efficiency and the size of the cache. If you have mirrored caching (which is not optional unless you're willing to go back to backups whenever you have a controller failure), the bandwidth between the two caches used for mirroring will be your bottleneck for large sequential writes, usually. Ensure that whatever you get has a high speed controller (or cluster) interconnect for write caching. Do your best to get a high speed front end network with as many ports as you can get while remaining realistic on price. The key to good front end performance is to put your storage load across as many front end resources as possible.
  • You can seriously reduce costs by having a tier for low priority storage, as well as thin provisioning. If your system isn't automatically migrating individual blocks to cheap large/slow drives (like nearline SAS or SATA with 7200 RPM and 2TB+ sizes), try to do it manually. Large slow drives are excellent targets for archives, backups, some file systems, and even servers with low usage.
  • Insist that the storage is VAAI integrated so that VMWare can de-allocate unused parts of the VMs as well as the datastores.

My big VMWare deployments are NFS and iSCSI over 10GbE. That means dual-port 10GbE HBA's in the servers, as well as the storage head. I'm a fan of ZFS-based storage for this. In my case it's wrapped around commercial NexentaStor, but some choose to roll their own.

The key features of ZFS-based storage in this context would be the ARC/L2ARC caching functionality, allowing you to tier storage. The most active data would find its way in RAM and SSD storage as a second tier. Running your main storage pool off of 10k or 15k SAS drives would also be beneficial.

This is another case of profiling and understanding your workload. Work with someone who can analyze your storage patterns and help you plan. On the ZFS/NexentaStor side, I like PogoStorage. Without that type of insight, the transport method (FC, FCoE, iSCSI, NFS) may not matter. Do you have any monitoring of your existing infrastructure? What does I/O activity look like now?


The key question is: "where's the bottleneck?" You mention IOPS, but does that mean that you're positively identified the disks themselves as being the bottleneck, or merely that the SAN ports aren't running at capacity, or that the VMs are in far more iowait than you'd like?

If you've definitely identified that the disks are the limiting factor, then switching to NFS or infiniband or whatever isn't going to do squat for your performance -- you need SSDs (or at least tiered storage with SSDs in the mix) or a whole bundle more spindles (a solution which has itself gotten a whole lot more expensive recently since the world's stepper motor production got washed into the ocean).

If you're not 100% sure where the bottleneck actually is, though, you need to find that first -- swapping out parts of your storage infrastructure more-or-less at random based on other people's guesses here isn't going to be very effective (especially given how expensive any changes are going to be to implement).


If you want iscsi or nfs then minimally you'll want a few 10/40gb ports or infiniband which is the cheapest option by far but native storage solutions for infiniband seem to be limited. The issue will be the module for the bladecenter what are its options, usually 8gb fc or 10\1gbe and maybe infiniband. Note that infiniband can be used with nfs and nothing comes closed to it in terms of performance\price. if the blade center supports qdr infiniband i'd do that with a linux host of some kind with an qdr infiniband tca via nfs. Here's a good link describing this http://www.zfsbuild.com/2010/04/15/why-we-chose-infiniband-instead-of-10gige

but if the bladecenter can support qdr infiniband and you can afford native infiniband then thats the solution you should pick.

Currently you can get 40gbe switchs far cheaper (thats a strange thought) then 10gbe switches but I doubt you're blade center will support that.