SW SSD Raid 1 over HW RAID 10

In RAID10 any one of your drives can fail and the array will survive, the same as RAID1. While RAID10 can survive four of the six "two drives failed at once" circumstances the main reason to use R10 with four drives instead of R1 with two is performance rather than extra reliability, and the SSDs will give you a greater performance jump.

Early SSDs had reliability issues, but most properly run tests I've seen suggest that those days are long gone and they tend to be no more likely to fail than spinning metal based drives - the overall reliability has increased and wear levelling tricks are getting very intelligent.

when running virtual machines is RAID 1 SW even ideal?

I'm assuming you are running the RAID array on the host, in which case unless you have a specific load pattern in your VMs (that would be a problem on direct physical hardware too) the difference between soft RAID and hard RAID is not going to be dependent on the use of VMs. If you are running RAID inside the VMs then you are likely to be doing something wrong (unless the VMs are for learning or testing RAID management of course).

The key advantages of hardware RAID are:

Potential speed boost due to multiplexed writes: software RAID1 will likely write to each drive in turn where with hardware RAID1 the OS writes just once and the hardware writes to both in parallel. In theory this can double your peak bulk transfer rate (though in reality the difference will likely be far smaller than that) but will impart little or no difference on random writes (where with spinning metal the main bottleneck is head movements and with SSDs the main bottleneck is needing to write larger blocks even for small writes, and the block clearing time if there are no blocks ready).
Safety through battery backup (or solid-state) cache (though this is only on high spec controllers) allowing caching to be done safely on the controller because in the even in sudden power loss situations the controller can maintain written blocks that haven't hit the drives yet and write them when power returns.
Hot-swap is more likely to be supported (though your DC's kit may support hot-swap more generally so it may be available for SW RAID too).

The key advantage of good software RAID (i.e. Linux's mdadm managed arrays) is:

Your array is never locked to a given controller (or worse, specific versions of a given controller) meaning your arrays can be moved to new kit if all the other hardware fails but they survive. I've used this to save a file server that had its motherboard die: we just transplanted the drives into a new box and everything came back up with no manual intervention (we did verify the drives against a recent backup and replace them ASAP, in case the death was a power problem that had affected but not immediately killed the drives, but this easy transplant meant we had greatly reduced downtime outside maintenance windows). This is less of an issue if your DC is well stocked with spare parts immediately to hand of course.

On SSD Reliability & Performance:

SSD over-provision space for two reasons: it leaves plenty of blocks free to be remapped if a block goes bad (traditional drives do this too) and it stops the write performance hole (except for huge write-heavy loads) even where TRIM is not used as the extra blocks can cycle through the wear levelling pool along with all the others (and the controller can pre-wipe them ready for next use at its leisure). Consumer grade drives only really under-allocate enough for the remapping use and a small amount of performance protection, so it is useful to manually under-allocate (partitioning only 200GiB of a 240GB drive for instance) which has a similar effect. See reports like this one for details on this (that report is released by a controller manufacturer but seems a general description of the matter rather then a sales pitch, you'll no doubt find manufacturer-neutral reports on the same subject if you look for them). Enterprise grade drives tend to over-provision by much larger amounts (for both the above reasons: reliability and performance).

It depends on the drives, the disk controller, the type of SSD, the RAID implementation, the Operating System(s) involved, the server, monitoring ability, whether you have out-of-band access to the server, etc.

Edit: you'll be on Linux + KVM.

Envision a drive failure of a hardware RAID solution that takes out one disk. You receive an alert and have the drive hot-swapped. Easy.
Imagine a software RAID SSD drive failure that goes undetected (no explicit monitoring) and requires downtime or may be more of an involved process to remediate.
Nothing precludes you from using SSDs with hardware RAID, correct?

But it all depends...

I would push for SSD with hardware RAID if you need SSD performance. I wouldn't necessarily want to boot off of software RAID, but that's your choice. For virtualization, you'll probably have a mix of random read/write activity. Hardware RAID's caching can be helpful. If this is a datacenter, you may not have to worry about sudden power-loss, though.

Speed vs Reliability imo

Most raid controllers do NOT fully support SSD's, or they only support a specific brand of ssd (see Dell perc 6xx's). Also, Friends don't let friends SR... Unless its their home gaming system.

(HW raid + ssd raid 1) vs (HW raid + physical disks raid 10)

The speed difference between SSD's (when fully supported by the raid controller) and HD's, is like comparing formatting floppy drives vs formatting usb sticks. One takes 3 min, the other takes 3 seconds. So if you need that kind of speed go with the ssd's...and make sure you have a good backup. If not, use physical disks, and have a good backup. ;-)

Which solution did you go with? Yes, SSDs are fast, and they give you real boost in performance if you use them for specific purpose e.g. host database server. I support a number of servers running with SSDs in Linux software RAID1. They all work OK except one. On that one server, RAID repeatedly reports disk failure for one of SSDs (randomly, not always the same disk (disk1 / disk2)). So far, I was unable to identify why. Also, consider how will host OS see these two SSDs, because there could be an issue with replacing disk (you would not be able to do hot swop). Can you hot swop disk in software raid if disk is also used for OS?

On the other hand, old school network storage with enclosure, good RAID controller and large number of disks (in RAD10) gives you peace of mind. Hot swop of failed disk is a must for production servers.

What ever you do, remember to keep regular backups to a separate hardware. It was said many times before "RAID is not replacement for backup".

SW SSD Raid 1 over HW RAID 10

Related

Recent Posts