Setup redundant iSCSI network with 2 switches, SAN and ESX

Solution 1:

Nicely structured approach and you're asking all the right questions. Your suggested redesign is excellent.

ESX 3.5 doesn't really do iSCSI Software Initiator multipathing but it will happily failover to another active or standby uplink on the vSwitch if a link fails for any reason . The VI3.5 iSCSI SAN Configuration Guide has some information on this, not as much as I'd like but it is clear enough. You shouldn't have to do anything on the ESX side when you change over but you will no longer get any link aggregation effects (because your uplinks are going to two separated non-stacked switches), only failover. Given the weakness of multipathing in the ESX 3.5 iSCSI stack this probably wont have any material effect but it might because you have multiple iSCSI targets so bear it in mind. I'm sure you know this already but Jumbo frames are not supported with the Software Initiator on ESX 3.5 so that's not going to do anything for you until you move to ESX 4.

In setting up the ESX vSwitch and VMkernel ports for iSCSI with ESX4 the recommendation is to create multiple VMkernel ports with a 1:1 mapping to uplink phyiscal NICs. If you want to create multiple vSwitches for this you can or you can use the NIC teaming options at the port level so that you have a single NIC designated as active per VMkernel port with 1 or more as standby. Once you have the ports\vSwitch configured you then need to bind the ports to the iSCSI multipath stack and it will then handle both multipathing and failover more efficiently. Given the way this works there is no need to worry about teaming across the switches, the multipath driver is doing the work at the ip-layer. This is just a quick idea of how it works, it is described in very good detail in the VI 4 iSCSI SAN Configuration Guide. That will explain everything you need to do, including how to set up Jumbo frame support properly.

As far as the stacking is concerned I don't think you need or want to do it for this config, in fact Dell's recommended design for MD3000i iSCSI environments is not to stack the switches as far as I can recall, for precisely the reason you mention. For other iSCSI solutions (Equallogic) high bandwidth links between arrays is required so stacking is recommended by Dell but I've never had a satisfactory explanation of what happens when the master fails. I'm pretty sure the outage during the new master election will be shorter than the iSCSI timeouts so VM's shouldn't fail but its not something I'm comfortable with and things will definitely stall for an uncomfortable period of time.