Dell PowerVault MD3200i dm-multipath configuration and performance snags in Debian 6.0 (squeeze)

I'll be using this iSCSI target for a couple of Debian-based KVM virtual machine hosts. Each of the target's redundant controllers has 4 ethernet ports; likewise for the initiators. I use two switches (ZyXEL GS-2200-24) with a trunk between them, and VLANs isolate each path. I've also enabled jumbo frames and flow control.

The MPIO system in this Debian release is fabulous: as long as dm-multipath is loaded before logging in to the iSCSI target, everything Just WorksTM without any configuration file, provided I load scsi_dh_rdac before-hand.

That's the first snag: I can change some of the defaults if I provide an /etc/multipath.conf file. I tested with use_friendly_names yes, which successfully creates an mpath0 link in /dev/mapper/ – instead of using the unfriendly-looking WWID. But if I try to change rr_min_io from the default 1000 down to 8, I get ignored; so I do this pretty dance:

dmsetup suspend mpath0
dmsetup table mpath0 | sed 's, 1000, 8,g' | dmsetup reload mpath0
dmsetup resume mpath0

That changes the number of requests sent down one of the quad links before round-robin kicks in and sends it down the next, from the default 1000 down to 8. This actually changes the multipath table (as per multipath -v3 | grep params). How does one configure this default in the new multipath code? I'm assuming this worked before multipath went all dynamic and self-configuring... At least all vendor docs I've read, and other discussions on the web kind of assume this worked.

A simple sequential write using dd bs=100M count=50 if=/dev/zero of=/dev/mapper/mpath0-part1 & sync goes from ~135MB/s up to ~260MB/s with this change. And that's the second snag: that's about 2Gbps instead of the 4Gbps I actually have between the initiator and the target. Running iostat -kd 1 for 1sec running updates shows only 2 of the 4 paths being filled up.

This LUN is short-stroked: its 16GB reside right at the beginning of a 12-spindle RAID10 array of 600 6Gbps SAS disks spinning at 15,000rpm. I was expecting this to be enough to saturate the 4Gbps I have; am I correct?


Solution 1:

Online Reconfiguration

The technique you used to change the rr_min_io is what multipathd does for you under the covers. The user friendly way to adjust values in a running map is echo reconfigure | multipathd -k

For example: Here's a NetApp who's rr_min_io is currently 128

# dmsetup table
360a98000534b504d6834654d53793373: 0 33484800 multipath 0 1 alua 2 1 round-robin 0 2 1 8:16 128 8:32 128 round-robin 0 2 1 8:64 128 8:48 128 
360a98000534b504d6834654d53793373-part1: 0 33484736 linear 251:0 64

/etc/multipath.conf was changed so rr_min_io was now 1000. Then,

# echo reconfigure | multipathd -k
multipathd> reconfigure
ok

To verify the change:

# dmsetup table
360a98000534b504d6834654d53793373: 0 33484800 multipath 0 1 alua 2 1 round-robin 0 2 1 8:16 1000 8:32 1000 round-robin 0 2 1 8:48 1000 8:64 1000 
360a98000534b504d6834654d53793373-part1: 0 33484736 linear 251:0 64

I agree multipathd could do a better job in advertising and reporting the additional variables it uses. Whatever delta multipathd doesn't report, dmsetup does, but that doesn't necessarily mean that using dmsetup directly is the best idea to reconfigure those settings. Reconfigure works for just about everything.

Active-Active load balancing

The deployment guide says your SAN is active-active but this term gets misused in the industry, in practice it can be "dual active", which means a LUN can only be accessed by a single storage processor at any one time, but both controllers can be active and drive distinct LUNs, they just can't load balance to the same lun.

Here on p79 under the load balancing section.

Two sessions with one TCP connection are configured from the host to each controller (one
 session per port), for a total of four sessions. The multi-path failover driver balances 
I/O access across the sessions to the ports on the same controller. In a duplex 
configuration, with virtual disks on each controller, creating sessions using each of the 
iSCSI data ports of both controllers increases bandwidth and provides load balancing

Note the plural use of virtual disks in the context of duplex configuration, it doesn't call out the same disk. This appears to be an dual-active deployment. True active-active SANs are usually reserved for Fibre Channel deployments. Maybe iSCSI SANs exist that accomplish this, I haven't come across one, though I don't extensively deploy iSCSI either.