howto plan RAID for ESX

Solution 1:

What's wrong with just R10'ing all 8 drives into a single logical disk (LD) of ~1.2TB, it'll be faster and a lot more resilient than your R0+R5 suggestion and gives you the same available space as Tom's R1+R6 suggestion (sorry Tom, you know I love you right). There's no need to have the OS on a different disk than the VMs at all, especially if you're using v4 as it handles locking much better.

EDIT FOLLOWING QUESTION EDIT

Basically you can't have it all, let's look at your newly-added criteria;

  1. You're happy to live with loss of drives if more than 1 disk fails - well fair enough, in that case anything but R0 will work for you.
  2. You want as much space as possible - given we've ruled out R0 then R5 will clearly give you the most space, followed by R6 then R10.
  3. You want good write performance - well R6 is significantly slower than R5 which in turn is slower than R10.
  4. If you have a single logical disk array for both the OS and VMs then losing the LD will kill your host, but then so would losing a VM-only logical disk - all your VMs would stop if either LD failed. The difference would be you wouldn't have to reinstall ESX/i again if you only lost a VM-only LD, but then again reinstalling the OS doesn't take long at all, plus it can be backed up. Given you'll lose 600GB if your R1 one them then I'd stick with either R10 or R5 myself.

Solution 2:

Again, I can't recommend RAID 0 ever, except in the most unimportant cases.

I second Tom's idea of RAID 1 to mirror the system drive where you install ESX(i), but I don't see the need to RAID-6 your drives, and lose the 2nd drive to parity information.

The reason that RAID-5 is falling out of favor is because of performance (due to the way the information is written across drives) and failure rate (because the likelihood of an unrecoverable read error goes up as drive capacity goes up). RAID-6 doesn't improve performance over RAID-5 when it comes to writes, and your 8 300GB drives aren't anywhere near the capacity where it starts to become statistically significant that you'll fail during a crash.

Solution 3:

There's quite a lot of articles out there that say that RAID5 is getting to the point of being EOL, because RAID6 is a better alternative, and allows 2 failures before your array collapses and dies.

RAID 10 might also be worth a look (depending on your plan), has higher performance than RAID5/6. You should probably put the ESX server itself on a pair of disks in RAID1, that way your server itself is resilient to a single disk failure. That's where all the clever stuff is. That's what you should protect the most.

I'd go for something like

RAID1: 2x 300GB => ESX Server

RAID6: 1198GB => VM Storage

Solution 4:

I always like to have at least two arrays on a server. It's a must for Exchange and SQL for DR purposes.

I would recommend a RAID1/RAID10 setup.