Greetings, I'm a long-time lurker, first time poster. For brevity, let me ask my question up-front and then give the longer backstory, so you can choose how much you want to read.

Short version...

We're in need of a SAN upgrade, definitely additional capacity, and very possibly better performance. Our main workloads will be around 20 assorted VMs and a couple SQL databases. In addition to a few domain controllers, exchange, and assorted standard corporate services, the main workload is hosting an application over citrix that will be accessed by approximately 300 users throughout each day. Currently we have an HP MSA1500cs SAN controller with 2 MSA30 enclosures, nearly fully populated with about 3.7 TB of raw drive space. Pure capacity isn't really the big concern so much as performance and reliability. I don't see our capacity needs outgrowing 10 TB in the foreseeable future. The question is, should we just add an enclosure or 2 to our existing controller? Should we upgrade to the current generation MSA produts, the MSA2000 series? Should we move up to the EVA4400 family? Should we be looking at the recently aquired lefthand SAN solutions? If we end up doing something besides just adding space to our current controller, should we stick with fibrechannel or be looking at iSCSI? The budget is not really set as this is part of a larger project with a big budget umbrella, but I'd say we want to be under $50k and cheaper is always better.

Long version...

We are about to expand the services our server farm provides. If you want the details, we're a nursing home corporation with about 45 nursing homes. We will be implententing fully electronic charting, meaning our usage of specifically database and citrix will increase significantly. Right now there are probably about 2 nurses at each facility that actually interact with our medical records software every day. This will be changing to probably around 6 nurses at each facility using the software far more often than the current usage.

In addition to the medical records system, we provide active directory and exchange for around 600 users, and a payroll system, and the usual assortment of miscellaneous services. The current database (MS SQL 2005) for the medical records system is about 30 GB and it will grow some with the new usage model, but mostly the frequency of access will increase more than the raw size.

Before I get any more specific with hardware, let me say that we're an HP house and my boss, the Director of IS, is pretty hard-headed about going with any other vendor. You and I may not agree with the HP choice, but it's pretty set in stone.

We're upgrading our server farm from 16 HP BL20p blades with older Xeon CPUs (dual socket machines but most only have a single cpu in them currently) and not very much memory by today's standards (8 GB max supported, most have much less than that) to probably modern stand-alone servers, such as an HP DL580 G5 with 4 sockets each with a 4/6 core modern Xeon or Opteron and 128 GB of memory each. We're currently using VMWare ESX 3 and plan to upgrade to a current version of VMWare. I would appreciate comments on the servers too, but my main question is about our SAN, keep reading.

I have been tasked with researching an upgrade to our current SAN solution. We currently have an HP MSA1500cs controller and 2 fully populated drive enclosures, with about 3.7TB of raw drive space. These user SCSI Ultra320 drives and the enclosures talk to the controller with u320 connections, and the controller connects to the server farm with a 4 Gb 32 port fibrechannel SAN switch. We will need to add a little space to implement the change, but mostly I am concerned with performance and reliability. I don't see our total storage ever outgrowing the ballpark of 10 TB.

I'm pretty new to the world of SANs, and it's a bit overwhelming at this point. As I see it, we have 3 main options. We can add more enclosures to our current system. It supports up to 8 enclosures and we only have 2 right now, so this would be a very simple upgrade path. We could also upgrade to the current generation of HP's MSA family, the MSA2000 series. Our 3rd option as I see it would be to upgrade to the next class of SAN, the HP EVA series. HP says that the MSA family is considered and entry-level SAN, which is what makes me think we might need something more substantial, but I realize that's the marketing department speaking.

If we just add some enclosures to our current controller, we have enough fibrechannel ports to connect the new servers, especially since we'll be retiring several old servers. If we do upgrade to a new SAN system, this brings up the question of whether to continue to use fibrechannel, or to go with a newer (and generally cheaper) technology such as iSCSI or FCoE.

I appreciate any comments or answers, and if anything needs clarifying just ask and I'll try to give you as much information as possible. Thanks in advance!


Solution 1:

The HP MSA1500CS is a pretty wimpy device. I have one, and I hate it. I'm somewhat surprised it has kept up with your stated workloads. It probably comes as no surprise that I recommend upgrading to the MSA2000. It has a much better storage architecture than the 1500CS, and can scale better.

Without more data I can't recommend going to an EVA4400 (HP's 'entry level enterprise array') versus the MSA2000. The 4400 will take you a lot farther than the MSA2000 will in terms of scale out, but I don't know what kind of growth you expect.

RE: LeftHand vs. MSA2000

So long as you have the ethernet network for it, the LeftHand unit should out-scale the MSA2000 by a long shot. The distributed storage controller it uses makes that kind of thing easy. You'll pay more per storage shelf, but you can scale to silly amounts with it. Once you start hitting the I/O ceilings on an MSA2000 (which will depend on the drive technology you use as well as any active/active configs you can use) you're pretty much done. For the LeftHand products that ceiling is a lot more mushy.

Where the LeftHand approach really saves you is with parity RAID. Doing rebuilds after a failure is the most CPU intensive thing it does, and is where my MSA1500cs falls flat on its ass. On my 1500cs, rebuilding a RAID6 array across 6.5TB of disk took about a week, during which time it was deeply intolerant of large scale I/O writes to anything on the array. Since LeftHand has a controller in each cabinet, restriping a LUN on one shelf will not affect performance of LUNs on other shelves. This is very nice!

All in all, if you have the budget for it the LeftHand devices should serve you a lot longer than the MSA2000.

Solution 2:

Assuming you need to stick with HP (and there's nothing horribly wrong about that), take a look at the recently acquired HP LeftHand SAN Solutions, they're about the most modern thing HP or anyone else sells when it comes to SAN storage. They sell a bundled version called the "HP LeftHand P4300 4.8 TB Starter SAN Solution" which would be a pretty reasonable fit for the general details you've given above.

However, if you're happy with your existing MSA1500cs performance and features then there's not really a burning reason to rip and replace it unless it's come to the end of it's warranty and an extended support contract is going to cost too much.

Solution 3:

I have no particular experience with HP's storage products so this is just general advice. You need to step back and carry out a thorough analysis of the performance requirements of the systems that need\want to use your SAN storage. Capacity is one metric but more importantly you need to get a handle on the IO patterns - average and peak IOPS loads per server\service over time. Break these down per volume, look at what can\should be split to see what sort of performance and capacity profile those volumes require. Then look at what sort of disk combinations (capacity, IO per disk, RAID types) can meet your needs.

As a (very rough) rule of thumb for your initial capacity estimates: SAS drives - Sustained random IOPS is about 70-90 for 7200 rpm drives, 100-120 for 10k drives and 150-170 for 15k drives. Some drives will be better, some workloads will allow higher IOPS but use those for a worst case sustained baseline.

Random read IOPS will be close to N*IOPS per drive for an N drive array of any normal RAID type (10,5,6,50). Random write performance incurs a penalty (50% for RAID 10, 75% for RAID 5\50, >75% for RAID6\60). Sequential read IO performance is RAID 10>50>60>5 but the differences aren't huge. sequential writes are similar but the differences are more significant. RAID 6 and 60 have great resilience but rebuild times are very long and performance degrades across the whole pack while they happen.

Don't ignore bulk sequential performance - you will need a lot of sequential performance for backups.

If you want to use hardware snapshots\replication then factor in both the capacity and the performance overhead associated with them. Good snapshot technology adds little IO overhead itself but if you start mounting snapshots on multiple systems the additional IO load will add up. From a capacity perspective Snapshots can use anything from as little as 10% of the base volume size to >100%.

Bear in mind that the baseline IOPS numbers I mentioned above are worst case extended scenarios where the Storage Array controller's write cache is overstressed, ideally you'd like to avoid that but you need to know if the underlying disks can handle the load.

Remember that you need to think about how the various loads associated with your storage volumes will be carved up from the disk combinations and you will want to ensure that heavy IO loads are kept apart and are placed on groups of disks with enough spindles and a RAID type like 10 or 50 that delivers (relatively) high performance. Bulk low performance storage should be put on cheaper (slower) disks and configured for RAID 5.

Once you have a good idea of what the combinations look like see whether the storage solutions available to you can deliver those characteristics and what the configs look like. Then go looking for prices.

Note I've totally ignored the network part of this - you do need to look at that too but for the most part 2G\4G FC \ a couple of iSCSI GigE ports will allow you to saturate random IO on any entry level SAN. A single GigE interface should be able to handle >10000 small (<4K) random IOPS for example,. If your loads are heavily geared towards sequential IO then you want to have many fat pipes - as few as 2 10k disks can stream data faster than a single GigE network link (>125Meg/sec) and an 8 drive RAID 10 array (>500Meg/sec) would give 4G FC a tough time.