Why does active-active configuration degrade performance compared to failover?

We are setting up the new storage for an HPC compute cluster that we are managing for applied statistics, bioinformatics, and genomics.

Configuration

We have the main enclosure with a Dell EMC ME4084 (84x12TB 7200rpm) and an additional enclosure with a Dell EMC ME484 (28x12TB). The EMC ME4084 provides ADAPT distributed RAID (similar to RAID6) and dual hardware controllers.

The file server is running CentOS 7. The storage is connected to the file server using two SAS cables. Each LUN corresponds to a 14-disk group with ADAPT and both SAS connections appear as the devices sdb and sdj. The examples below are given for LUN ID 0.

We configured multipath as follows for the active-active configuration:

$ cat /etc/multipath.conf
defaults {
    path_grouping_policy multibus
    path_selector "service-time 0"
}

$ multipath -ll
mpatha (3600c0ff000519d6edd54e25e01000000) dm-6 DellEMC ,ME4
size=103T features='0' hwhandler='0' wp=rw
`-+- policy=‘service-time 0' prio=1 status=active
  |- 1:0:0:0  sdb 8:16  active ready running
  `- 1:0:1:0  sdj 8:144 active ready running

The failover configuration:

$ cat /etc/multipath.conf
defaults {
    path_grouping_policy failover
    path_selector "service-time 0"
}

$ multipath -ll
mpatha (3600c0ff000519d6edd54e25e01000000) dm-6 DellEMC ,ME4
size=103T features='0' hwhandler='0' wp=rw
|-+- policy=’service-time 0' prio=1 status=active
| `- 1:0:0:0  sdb 8:16  active ready running
`-+- policy=’service-time 0' prio=1 status=enabled
  `- 1:0:1:0  sdj 8:144 active ready running

We verified that writing to mpatha results in writing to both sdb and sdj in the active-active configuration and only to sdb in the active-enabled configuration. We striped mpatha and another mpathb into a logical volume and formatted it using XFS.

Test Setup

We benchmarked I/O performance using fio under the following workloads:

  • Single 1MiB random read/write process
  • Single 4KiB random read/write process
  • 16 parallel 32KiB sequential read/write processes
  • 16 parallel 64KiB random read/write processes

Test Results

                       Failover           Active-Active
                 -------------------   -------------------
   Workload        Read       Write      Read       Write
--------------   --------   --------   --------   --------
1-1mb-randrw     52.3MB/s   52.3MB/s   51.2MB/s   50.0MB/s
1-4kb-randrw     335kB/s    333kB/s    331kB/s    330kB/s
16-32kb-seqrw    3181MB/s   3181MB/s   2613MB/s   2612MB/s
16-64kb-randrw   98.7MB/s   98.7MB/s   95.1MB/s   95.2MB/s

I am only reporting only one set of tests but the results are consistent across replicates (n=3) and to the choice of path_selector.

Is there any reason active-active cannot at the very least match the performance of active-enabled? I don’t know if the issue is with the workloads and multipath configuration. The difference was even more staggering (20%) when we used a linear logical volume instead of striping. I'm really curious to see if I overlooked something obvious.

Many thanks,

Nicolas


As you are using HDDs, a single controller is already plently fast for your backend disks. Adding another controller in active/active mode means no additional IOPs (due to HDDs), but more overhead at the multipath level, hence the reduced performance.

In other words: you will saturate the HDDs way before the CPU of the first controller, so leave them in active/passive mode. Moreover, I would try to use a single 28 disk array and benchmark it to see if it provides more or less performance than the actual 2x 14 disks setup.