RAID 10 or RAID 5 for multiple VMs - what is the best choice?

I have just ordered a new rig for my business. We do a lot of software development for Microsoft SharePoint and need the rig to run several virtual machines for development and test purposes. We will be using the free VMware ESXi for virtualization. For a start, we plan to build and start the following VMs - all with Windows Server 2008 R2 x64:

  • Active Directory server
  • MS SQL Server 2008 R2
  • Automated Build Server
  • SharePoint 2010 Server for hosting our public Web site and our internal Intranet for a few people. The load on this server is going to be quite insignificant.
  • 2xSharePoint 2007 development server
  • 2xSharePoint 2010 development server

Beyond that we will need to build several SharePoint farms for testing purposes. These VMs will only be started when needed. The specs of the new rig is:

  • Dell R610 rack server
  • 2xIntel XEON E5620
  • 48GB RAM
  • 6x146GB SAS 10k drives
  • Dell H700 RAID controller

We believe the new server is going to make our VMs perform a lot better than our existing setup (2xIntel XEON, 16GB RAM, 2x500 GB SATA in RAID 1). But we are not sure about the RAID level for the new rig.

Should we go for having the the 6x146GB SAS drives in a RAID 10 configuration or a RAID 5 configuration? RAID 10 seems to offer better write performance and lower risk of a RAID failure. But it comes at a cost of less drive space. Do we need RAID 10 or would RAID 5 also be a good choice for us?


Solution 1:

There's lots of similar questions/arguments on this site regarding R10 vs. R5/R6 but they boil down to "exposure during rebuild". The argument for R10 over R5 is strongest when dealing with the larger, slower disks some buy because their GB/$£€ is better (i.e. 2/3TB 7.2k SATAs) as arrays of these disks can take literally days to rebuild following a disk replacement or addition - meaning the entire array would be lost if a second disk failed during this rebuild window.

For many on this site this risk is too high, myself included. R6 changes this a little but usually brings with it often much slower write performance. Also doing any of this in software further reduces performance during rebuild as all data is going over the same bus, including 'in life' traffic.

You've done a good job of picking your components already and you'll certainly see a huge improvement in performance. If I were you I wouldn't 'fall at the final hurdle' - I'd use R10 knowing you'd done the right thing. If you're concerned about space you can use Thin-Provisioned disks and/or buy the 600GB 10k disks instead of the 146GB 15k disks, the performance drop-off won't be too bad but you'll have a lot more space - you could always buy 4 x 600 today and add 2 more later if you needed the extra spindles?

Solution 2:

If this is a mission critical system, then you need to make sure that you've got some spare drives locally should one fail (unless you have some support contract on the hardware that says you can get replacements same-day, but even then a local spare is worth having).

Ignoring that (or assuming the six drives doesn't count spares you might have easy access to) I would suggest RAID10 (three RAID1s nested in a RAID0) over RAID5 for the performance reasons you mention. Or if space is not at all an issue and redundancy and rebuild-on-drive-failure time is a big concern, then you might even consider two three-drive RAID1s nested in RAID0 (but that is overkill for most purposes, though it would allow two drives on each R1 leg to fail at the same time while keeping the array alive).

There is another option though: three separate RAID1 arrays (or possibly two RAID10 arrays if your controller supports 3-drive RAID10 (RAID1E as some controllers call it)). This way you can spread the VMs over different spindles so they will compete with each other far less for IO bandwidth. Two VMs on different RAID1 arrays can be merrily thrashing their virtual disks without much affecting the responsiveness of each other or a VM on the third array. Of course this can end up being wasteful space-wise: you may end up with a lot of free space on one array but don't want to use it as there are already I/O intensive VMs in constant use on that array for instance (though in this case if you had a single array the VM you would put in that space would be competing for IO access like that anyway), or you may end up with 25Gb free on each array but need 50Gb for a new VM.

This technique can make a lot of difference with spinning-disk-and-arm based drives if you balance your VMs between the drives right. It still makes a difference on SSDs too, but less so as you do not have the head movement and waiting-for-the-right-sector-to-pass-by issues causing extra performance killing latency. Though as I said above, it can be more work to manage. In the use case you describe, you might put that lightly loaded sharepoint server and the build-master on one array and the development VMs on another (possible one array each, if you have three arrays and no other active VMs). As needs change you can always move the VMs around the arrays to rebalance the load with little down-time (no down-time at all if your chosen virtualisation solution supports live migrations between local data stores).

Solution 3:

As as been answered here before quite a few times - just don't use RAID 5! BAARF has some strong views on the subject!

You will get worse performance than RAID 10, degraded performance during a rebuild following a drive failure, and should another drive fail during this period, you will restoring from backups!