We have a couple servers hosting Citrix VM's. We are planning on adding SAN storage to our network so that we can do the quick migration and high availability thing. The question comes up with deciding what NICs to buy for the servers. We have scads of available pci(e) slots on the servers, so density is not a factor. We're not going to do 10Gbs.

The concerns are gigabit, jumbo frame support (which any old nic is apparently capable of) and making sure none of the processing load is put on the server itself (no softmodem type deal).

I see some nics advertise "copper connection." Is there any benefit to that? What other options should I not compromise on?

Wouldn't it be preferred to have single port nics over dual for redundancy purposes? That's kinda what I gathered from this SF thread.

The San setup we're looking at if anyone's interested:

Clariion: http://www.emc.com/collateral/hardware/data-sheet/h4097-clariion-ax4-ds.pdf (data sheet) http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=6076470 $9679.99

OR

Dell: http://www.dell.com/us/business/p/powervault-md3200i/pd?refid=powervault-md3200i&baynote_bnrank=0&baynote_irrank=1&~ck=dellSearch $10,749 (not configured as printed--baseline 1tb) $12,905.00 (8tb configured)

Managed Switch: http://www.newegg.com/Product/Product.aspx?Item=N82E16833122074 x1 $599 OR http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=3334993&sku=C50-2162 x1 $350

Edit: to clarify the intent of the SAN setup, the illustration below shows the added isolated network we are adding on for the storage for the 2 xen servers. We don't (I believe) need a fancy switch, just one with jumbo frame capability, standard management features and vlan capability: alt text


Solution 1:

You're asking about doing a small iSCSI SAN, and you're on the right track. We do something very similar with Dell servers and MD3000i arrays.

In the diagram provided, you show two links from server to switch. I think you can get away with one link, unless you're bonding them for greater throughput. As shown, the arrangement protects against failure of the server NIC, the cable, and the port on the switch. A better (high-dollar) approach would be to add a second switch, and connect each Server to each switch, and cross connect the switches. That protects against loss of an entire switch, but adds the complexity of Spanning Tree, which is the Layer2 protocol for preventing the loop that otherwise appears in the network when 2 switches are introduced. From there, the two switches are commonly attached to two SAN heads, which themselves are cross-connected.. but that's larger scale than you've asked about. Go single-path the whole way, and accept the marginal increased risk in trade for ease of care & feeding.

Regarding ease of care & feeding: Think long and hard about the relative likelihood of hardware failure versus wetware failure. I feel like I see 5:1 ratio of human goofs versus actual HW fail, and so if you're not going to do the mega-buck fully-redundant-everything, keep it simple.

If you enable Jumbo Frames, you've got to do Jumbo Frames everywhere on that network. You've sketched out a dedicated storage network, so you can do it - I'm not so fortunate.

If you ARE bonding those server NICs for throughput, consider adding more bonded interfaces from the switch to the SAN head. If each of N servers is doing X traffic, the san needs to keep up with NX traffic, minus some small oversubscription fudge factor.

I believe the "copper connection" you've asked about is simply copper CAT6 twisted-pair ethernet, as used for iSCSI. As you move into the higher-end SAN world, you see more optical fiber connections, and HBA cards with various modular physical connectors - SFP, GBIC, etc.

Tangentially, how are you dividing up user sessions between the Citrix servers? Is there any sort of active loadbalancing in place (Netscaler?) If you have HA Xen servers, how does the failover process look from the user perspective? Make sure you've identified the mechanism by which this SAN actually improves things for the users.

Edited to add: You might also price out a more traditional shared direct-attach storage cluster. I do these with Windows, so I don't have the details around Xen/Linux, but it's a SAS disk shared between the two nodes. Example being the Dell MD3000 (not the "i" model). (With Dell, you need the proper HBA too, SAS 5/e, iirc) If you're never going to add more compute nodes, the SAS cluster might be easier & cheaper to build. Whatever you end up doing: Validate & Test, test, test. In my experience folks build a cluster to add "high availability" without defining what that means in real terms, and then don't validate that it protects them against the failures they were expecting (hoping, really) against.

Solution 2:

Wait - you're asking for server grade NICs but want to buy a 350 bucks switch?! I don't get that ...

Usually "server grade" 48 port GigE switches go for somewhere around 3000-5000 USD list price. Maybe you want to look out for switch side things like stacking for cross-stack LACP.

Regarding the NIC, things like:

  • proper DMA interface and good drivers that support MSI-X interrupts (check reviews on performance for that)
  • matching PCI-E interface speed
  • multiple ports if needed
  • TCP offload engine
  • L2 features like 802.1Q and 802.1X
  • iSCSI offload engine
  • GBIC support if you need to mix and match WAN and LAN on same NIC

Solution 3:

So what really makes a server grade NIC?

Primarily quality.

From a manufacturing point the differences between a server and "consumer" grade NIC are somewhat similar to the differences between a proper server motherboard and the one in an ordinary PC. First, the components are generally higher grade, selected with tighter tolerances and higher reserve margins. There will also normally be better protection circuitry to protect against damaged caused by out of tolerance voltages, such as spikes induced in a network cable that might fry a regular NIC.

The chipsets used will generally have some level of redundancy, where parts of the circuit can be switched in and out as required to handle fault conditions. Not at all unlike having redundant power supplies in the server.

The firmware is also likely to be better in a server grade NIC, which usually allows it to handle network conditions that might bring a regular NIC to its proverbial knees.