Dell VRTX - slow cluster shared storage
I have a brand new Dell VRTX box set up as a Failover Cluster running HA Hyper-V virtual machines. This is my first time setting up clustering, and my first time with one of these boxes, so I'm sure I've missed something.
The virtual machines are experiencing high disk latency and bad performance when accessing their VHD(x) files located on a Cluster Shared Volume.
The VRTX has 10 x 900 GB 10K SAS drives in RAID 6 configuration, and the VRTX has the redundant Shared PERC 8 controllers. Both blades have full access to the virtual disks. There are two M520 blades installed, each with 128 GB RAM. MPIO is configured for the PERC 8 controllers. Operating system on the blades is Server 2012 (NOT R2).
The RAID 6 array is split into a small (8 GB) volume for cluster quorum witness and a large (6.5 TB) volume for a Cluster Shared Volume (mounted on the nodes as C:\ClusterStorage\Volume1)
An example of slow disk access: logging into a Server 2012 VM and having Server Manager come up automatically. Disk access goes to 100%, with write speeds at 20 MB or so, read speeds of 500 KB or so, and Average Response Time of over 1000 ms, sometimes spiking at 4000-5000 ms or so. It's the latency that really worries me.
Is there something specific I should look at in my configuration? It doesn't seem to matter whether I use VHD or VHDX, dynamic or static.
I have experienced the exact same performance issue with a VRTX with the Dual SPERC8. What I have done to work around this at the moment is, change the dual config to a single config. This way I am able to use write-back, which performs way way better.
- The exact steps:
- Remove the second SPERC 8 controller
- Remove the second expander
- Re-cable the internal SAS connections
- Downgrade chassis to 1.25 (works the same as upgrading, no special steps needed)
- Delete all VD's (backup data/VM's if needed)
- Power cycle entire VRTX (remove and reconnect power cables to be sure)
- create VD('s) with write-back enabled
To see the performance difference check this/my thread at: http://en.community.dell.com/support-forums/servers/f/906/t/19587459.aspx
Update:
Test results:
Dual PERC / RAID6 / Write Through: Read 2500 MB/s Write 200 MB/s
Dual PERC / RAID10 / Write Through: Read 2500 MB/s Write 400 MB/s
Single PERC / RAID6 / Write Back: Read 2500 MB/s Write 2700 MB/s
As long Dual PERC is bound to Write Through policy, I would stick with a Single PERC setup
This:
Fault Tolerant Shared PERC 8 Card Configuration — [...] The default cache policy for virtual disks created in this configuration is write-through. In this mode, write completion information is returned to the host after the data is written to the disk.
is the ultimate performance killer. Change the cache policy to write-back if it is supported for your application and does not result in possible inconsistencies within the written data. Note that I have no idea if and by which mechanism the PERC8 cache is being mirrored to the other instance. As the cached data needs to be accessible from both controllers, it obviously would be necessary for consistency.
It is no longer required to remove the second SPERC controller to be able to use write-back instead of write-through as described in Erik's post. You can now disable the second PERC8 controller from the CMC. In the current firmware (1.35), the second disabled controller can be set as disabled and requires manual intervention to activate if the primary active controller were to fail. The usefulness of this is not really detailed in the patch notes, but this "fix" is meant to allow people to enable Write-Back and get rid of the abysmal performance you have when using both controllers in Write-Through.
Automatic failover (cold failover, causing disconnect) is a feature that will be release later. A lot later will be the actual "active/active" firmware update, which would allow "live" failover without downtime.
Steps:
- Download VRTX CMC firmware version 1.35 or higher.
- Shut down all your blades.
- In the CMC interface, press "Chassis Overview > Update".
- Select both checkboxes for the CMC controllers in the "CMC firmware" heading and press "Apply CMC Update".
- Enter the location of the CMC firmware file and Apply it.
- The CMC will show you its progress. Uploading takes about 8 minutes, applying the update a few minutes more. The CMCs will reboot after applying the update and you will get kicked out of the webinterface.
- Once the CMC has rebooted, browse to the "Storage > Controllers > Troubleshooting" section.
- For the SPERC of your choice, select the "Disable RAID controller" option and apply. This will reboot your storage component.
- Once rebooted, go to "Storage > Virtual Disks > Manage" and select "Edit: Write Policy" and pick "Write Back" instead of "Write Through" for all your virtual disks (unless you have a reason not to).
- The change above will be carried out immediately, but it is still suggested to reset the CMC once more under "Chassis Overview > Power > Control" with "Reset CMC (warm boot)".
- Boot your shut down blades.
This will allow you to have a second PERC8 installed in your VRTX, in case the other fails. But you will have to do a manual intervention to fail over in case of failure. I suppose this is primarily meant for hard to reach locations (remote offices without IT staff or easy access for Dell support technicians). This is also what we use it for.
Hopefully by end of the year we will have the automatic failover feature and then in the course of next year the true active/active config with write-back enabled (synchronized caches). I'm not going to hold my breath for the synchronized cache firmware fix... I suspect that won't be an easy one for Dell.