Should I be concerned about our server setup?

Solution 1:

Sounds like you're already using decent hardware. What's wrong with it? It's not too old right? Keep your stuff in warranty or close, if you don't want to feel too concerned (not everyone shares my opinion on this).

If you have stuff setup to be redundant and good backups, you're doing pretty good. One server = single point of failure, no matter how good it is, that would make me uncomfortable. There's a lot you can do on a budget by making smart decisions around how stuff is implemented software/hardware/infrastructure/support wise.

If you don't have precautions/things in place, maybe you should be worried. If one system dies, are it's services are gone? How will that effect business? How fast can you recover?

Pitfalls? Depends. You didn't provide too much info. Cheap drives can fail or be slow. Cheap cases can overheat. Cheap fans can fail. Cheap SATA/SAS/RAID controllers can screw up or not perform as expected. Cheap power supplies can die or if not redundant, leave you without power. Motherboards can do wonky things. Systems without remote consoles (ILO, etc...) can be a pain to manage. Cheap network cards can have cheap drivers or screw up. Lots of little unforeseen issues can occur. On the other hand, you can get cheap as hell entry level stuff that performs amazingly. And more expensive stuff can be wonky sometimes too.

I've seen it all, in decent* server grade, lower end server, workstation and consumer grade equipment. Higher end stuff seems to do better in the long run (way past warranty). But if you can't afford it? Or you can only afford one server and can't implement proper redundancy?

There's nothing essentially wrong with dual servers running with Xeon's, ECC memory and RAID. Unless you have a problem with it.

Solution 2:

Assuming your VMs are redundant (and this is tested as working with one node off), you are probably relatively immune from an hardware related outage by virtue of having two mirrored nodes.

Without knowing more, I wouldn't recommend dropping down to a single (newer) box unless an outage of the entire node isn't a major problem at your company.

That said, it would be helpful to know some extra details about your environment... such as, how how long you've had these machines, are they in a purpose built environment (clean, dry room with a rack & AC etc). As you are probably aware, well looked after equipment lasts longer!

Generally speaking, there's nothing necessarily wrong with using less 'professional' hardware, it just doesn't come with the same guarantees or reliability as more expensive kit, and these risks need to be weighed against your budget.

Solution 3:

Since your storage backend is all-flash, your hardware is totally OK for the mentioned workload. The only concern I have regarding your configuration that your VMs are split and running on a single server instead of being mirrored/synchronized between servers especially if they are identical. Thus I would strongly recommend you to use some software-defined storage (virtual SAN) that will let you join both servers into a single cluster and making your virtual machines immune to possible hardware failures.

Possible options are HP VSA http://www8.hp.com/us/en/products/storage-software/product-detail.html?oid=5306917 or EMC Unity VSA https://store.emc.com/us/Product-Family/EMC-Unity-Products/EMC-Unity-VSA/p/EMC-Unity-Virtual-Storage-Appliance which is free but as far as I know not allowed for production. Since you are using Hyper-V a perfect option for you would be to use StarWind Virtual SAN https://www.starwindsoftware.com/starwind-virtual-san that runs natively on top of windows and allows you to seamlessly create a fully-functional Microsoft Failover Hyper-V cluster using only directly attached storage.

I would also recommend using VEEAM B&R https://www.veeam.com/vm-backup-recovery-replication-software.html that has a free version or Bacula http://blog.bacula.org/ to backup your VMs instead of using native Windows 2012 Server Backup since it is known for causing issues when trying to recover your VMs.