Tuning iSCSI storage
This is a Canonical Question about iSCSI we can use as a reference.
iSCSI is a protocol that puts SCSI commands as payload into TCP network packets. As such, it is subject to a different set of problems than, say, Fibre Channel. For example, if a link gets congested and the switch's buffers are full, Ethernet will, by default, drop frames instead of telling the host to slow down. This leads to retransmissions which leads to high latency for a very small portion of storage traffic.
There are solutions for this problem, depending on the client operating system, including modifying network settings. For the following list of OSs, what would an optimal iSCSI client configuration look like? Would it involve changing settings on the switches? What about the storage?
- VMWare 4 and 5
- Windows Hyper-V 2008 & 2008r2
- Windows 2003 and 2008 on bare metal
- Linux on bare metal
- AIX VIO
- Any other OS you happen to think would be relevant
I'm not familiar with VMWare, but I do use Xenserver and I have used Hyper-V (R2).
With my current Xenserver configuration I have:
- 8 Dell Poweredge 29xx servers
- 2 Dell Powerconnect 6248 switches
- 2 Dell MD3000i SAN (iSCSI)
I have setup my switches in a multipath configuration and optimized for iSCSI by:
- Separating my switches into 3 VLANS (2 for iSCSI traffic and 1 for management)
- Using JumboFrames
- Applying the "iSCSI" optimizations that the powerconnect has
Each server has multiple network cards to provide a connection to each switch, in turn providing redundancy via multipathing between the servers and the iSCSI SAN. The iSCSI VLANs contain no other traffic than iSCSI.
I'm pleased to report that with this configuration the Xenserver "cluster" works brilliantly.
On a side note I do have a Windows 2008 server connected directly by iSCSI to an HP SAN (old file server). It used to run Windows 2003, and regularly would drop the connection (even after a reinstall of 2003); however, as soon as I upgraded to windows 2008 it remains connected.
I'll be happy to answer any question about my setup.
This is not an answer... yet. This is the framework for the Generic Answer. If you have time please fill-in anything you know about. In regards to configuring specific hardware, please post a separate answer for each vendor so we can keep that information organized and separate.
QoS profile to the ports, as well as turning off storm control, setting up MTU to 9000, turning on flow control, and putting the ports into portfast
Throughput and Latency
Updated firmware, drivers, and other systems
MPIO
Jumbo Frames/MTU
As the speed of network links increases the number of packets potentially generated also increases. This yields more and more CPU/interrupt time spent generating packets which has the effect of both unduly burdening the transmitting system and taking up an excessive amount of link bandwidth with framing.
So-called "jumbo" frames are Ethernet frames that exceed the canonical 1518 byte limit. While the numbers may vary based on switch vendors, operating systems and NIC's the most typical jumbo packet sizes are 9000 and 9216 bytes (the latter being the most common). Given that roughly 6X the data can be put into a 9K frame, the number of actual packets (and interrupts) is reduced by a similar amount on the host. These gains are especially pronounced on higher speed (i.e. 10GE) links that send large volumes of data (i.e. iSCSI).
Enabling jumbo frames requires configuration of both the host and the Ethernet switch and considerable care should be taken before implementation. Several guidelines should be followed-
1.) Within a given Ethernet segment (VLAN) all hosts and routers should have the same MTU configured. A device without proper configuration will see larger frames as link errors (specifically "giants") and drop them.
2.) Within the IP protocol two hosts with differing frame sizes need some mechanism to negotiate an appropriate common frame size. For TCP this is path MTU (PMTU) discovery and relies upon the transmission of ICMP unreachable packets. Make sure that PMTU is enabled on all systems and that any ACL's or firewall rules permit these packets.
Ethernet Flow Control (802.3x)
Despite being recommended by some iSCSI vendors, simple 802.3x ethernet flow control should not be enabled in most environments unless all switch ports, NICs, and links are totally dedicated to iSCSI traffic and nothing else. If there are any other traffic on the links (such as SMB or NFS file sharing, heartbeats for clustered storage or VMware, NIC teaming control/monitoring traffic, etc.) simple 802.3x flow control should not be used as it blocks entire ports and other non-iSCSI traffic will also be blocked. The performance gains of Ethernet Flow Control are often minimal or non-existent, realistinc benchmarking should be performed on the entire OS/NIC/switch/storage combinations being considered to determine if there is any actual benefit.
The actual question from a servers perspective is: Do I stop network traffic if my NIC or Network is overrun, or do I start dropping and retransmitting packets? Turning flow-control on will allow for buffers the NIC to be emptied on the receiver side but will stress the buffers on the sender side (normally a network device will buffer here).
TCP Congestion Control (RFC 5681)
TOE (TCP/IP Offload Engines)
iSOE (iSCSI Offload Engines)
LSO (TCP Segmentation/Large Send Offload)
Network Isolation
A common best practice for iSCSI is to isolate both initiators and targets from other non-storage network traffic. This offers benefits in terms of security, manageability and, in many cases, dedication of resources to storage traffic. This isolation may take several forms:
1.) Physical isolation - all initiators have one or more NIC's dedicated solely to iSCSI traffic. This may- or may not- imply dedicated network hardware depending on the capabilities of the hardware in question and the specific security and operational requirements within a given organization.
2.) Logical isolation - Mostly found in faster (i.e. 10GE) networks, initiators have VLAN tagging (see 802.1q) configured to separate storage and non-storage traffic.
In many organizations additional mechanisms are employed to also assure that iSCSI initiators are unable to reach one another over these dedicated networks and that, further, these dedicated networks are not reachable from standard data networks. Measures used to accomplish this include standard access control lists, private VLAN's and firewalls.
Something about backplane and switching fabric here too.
QoS (802.1p)
vLAN (802.1q)
STP (RSTP, MSTP, etc)
Traffic Suppression (Storm Control, Multi/Broad-cast Control)
Security
Authentication and Security
CHAP
IPSec
LUN Mapping (Best Practices)
Some consideration and research you should be taken subjectively in regards to:
1) Multi-pathing - Your SAN solution and your OS, be it hypervisor or bare metal OS may need vendor specific software for this to function properly.
2) Initiators - You need to vet out whether the software initiator is sufficient enough performance based upon the demands. Many NICs have iSCSI offloading capabilities which can significantly improve throughput, but certain older hypervisors have been known to get quite pissy with them support wise. The more mature offerings (ESXi 4.1+) seem to place nice.
3) Security/Permissions - Be sure to fully vet out which initiators require access to which LUNs... you'll be in for a bad day if an admin on one of your Windows machines does an "initialize disk" on a disk that is really in use by another server as a VMware datastore.