Root-causing vastly different performance on iozone O_SYNC benchmark for two HDD manufacturers

Regarding the measured 33x difference between your results, following up on our discussion in the comments, it turned out, that MegaCli64 -LDGetProp -DskCache -Lall -aAll showed that setup B had the disk drive cache enabled by default, while it was disabled on setup A.

Using MegaCli64 -LDSetProp -DisDskCache -Immediate -Lall -aAll resulted in both systems showing a similiar performance.

Is it safe to run the RAID with disk drive cache enabled?

Running a RAID with disk drive cache enabled is actually similiar to running a RAID controller with non BBU backed volatile cache with write caching enabled (forced write-back mode). It enhances the performance, but at the same time increases the possibility of data-loss and data-inconsistency in the event of a power failure.

If you want to avoid this chance, while still having a decent I/O performance, it is advisable to have a controller with BBU backed-cache and to configure your volume to write-back mode with disk caching disabled.

The difference between your two RAID controllers

I don't know if you already knew, but there is more between software and hardware RAID (this is an interesting article regarding this).

In the end the MegaRAID SAS 2008 is more or less an HBA or IO-Controller with added RAID capability, while the MegaRAID SAS 3108 is a real RAID Controllerâ„¢ (also called ROC or RAID-on-Chip), which has a dedicated processor for handling the RAID calculations.

The SAS 2008 is especially known for horrible write performance with some OEM firmwares (like the DELL one in the PERC H310 which I mentioned in the comment).

Especially the synchronous mode in combination with your chosen record length and file size seems to result in really poor results with software/fake RAID.

For reference, this is what I get on my workstation using 10k WD Velocity Raptors in software RAID1:

                                                    random  random    bkwd   record   stride                                   
      KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
   10240       4     182     181  1804774  2127084 2110984     167 1673159      153  1760968   954589  1203989 2022512  2062824

If you are running in synchronous mode (O_SYNC) your Result A seems therefore to be reasonable in terms of what can be delivered via soft/fake-RAID.


Does write-through cache mode cause a performance degradation of the array over time?

I don't think so. With an activated write-cache, the controller is able to perform certain operations to optimize the pending write operations.

For example this description of the cache operation is taken from the whitepaper for HP Smart Array controllers:

The write cache will typically fill up and remain full most of the time in high-workload environments. The controller uses this opportunity to analyze the pending write commands to improve their efficiency. The controller can use write coalescing that combines small writes to adjacent logical blocks into a single larger write for quicker execution. The controller can also perform command reordering, rearranging the execution order of the writes in the cache to reduce the overall disk latency.

As you can read, the cache is used to further enhance the write-performance of the array, but this does not seem to have any impact on the performance of any subsequent write or read operations.

Regarding disk-fragmentation, this is a file-system/OS level problem. The RAID controller - operating on the block level - isn't able to optimize file system fragmentation at all, so there is no difference if it operates in write-trough or write-back mode.