Raid 10 Performance Issues
I am in the process of setting up a mirrored storage system for our Business.
We don't have the budget for prebuilds so I am trying to do what I can to get the best bang for our buck. Here is our hardware breakdown:
San1 and San2 Windows Server 2019
SUPERMICRO MBD-H11SSL-I Amd Epyc 7251 8 core CPU
64GB RAM 8GB x8
SSD for OS 500GB
LSI 9380-8i8e
Intel 10G nic, 4 port - Iscsi network
Intel 25G nic, 2 port - Sync between Servers - Jumbo Frames-9014.
1 internal nic 1G (data), 1 IPMI In use on MB
IW-RJ224-03 24bay SSD Enclosure, Populated with 24 2TB Samsung 860 Pros, Raid10 configuration. Connected via 2 sas cables to the 9380 card.
We will be using Starwind to sync the 2 servers.
While in the process of setting up Starwind, I have been trying to see our sync performance Using varying image sizes from 500G to 5TB
When a sync starts, the system writing the sync data is barely usable. The system stutters, performance monitor hangs, and everything runs horribly unless I turn off all caching options. If I enable writeback, or Enable disk cache, I notice Core0 on numa 0 peg 100% and everything goes south... other cores show very little, or no usage, minus a couple.
I have tried every kind of combination of drive setup to get through this, but I am getting nowhere at this point. I must be missing something. I have configured the Array in 2x8, 6x4, and 4x6 (standard 64k strip) settings thinking it was some drive limitation holding me back, but I have had 1 instance, where nothing went wrong, and the drive wrote a 5TB sync with no issues, and in an hour with perfect system response. It was going over 1.6GB/s at that time with both Caches Enabled on a 4x6 array. I did notice that core0, numa0 was near idle that time, and core 2,numa 0 was doing the heavy lifting. Took everything down to replicate and rebuild, been stuck since. Now every transfer maxes out at about 600MB writes with cache off, and when on hits about 1GB/s before it is noticeably struggling.
Any Ideas to help point me in the right direction are appreciated! Firmware up to date on the 9380, Drivers for Raid cards, Nics, and MB components are all up to date.
Here some thoughts, which may help to solve the issue:
- If you are using some kind of NIC-Teaming, it may affect performance of iSCSI and replication in unpredictable way. Most SAN’s/VSAN’s vendor don’t support Teaming and recommend MPIO instead. Disable NIC-Teaming.
- You mentioned Intel 25G NIC. XXV710 model may have issues with enabled Jumbo Frames. Disable Jumbo Frames and run additional tests.
- Jumbo Frame value 9126 is not typical to Windows OS and used mostly on switches. Windows default value is 9014.
- LSI 9380 doesn’t have Samsung 980 Pro in the list of supported drives. Moreover, 980 Pro is an NVMe drive (not SATA). Are you sure, that you have 980 Pro?
I’d also recommend to contact Starwind’s support, as BaronSamedi1958 mentioned.
You need to fine tune the synchronization priority for the whole thing to function properly.
https://www.starwindsoftware.com/help/ChangingSynchronizationPriority.html
As you deal with a paid solution I’d suggest to apply for support.