Mellanox and Infiniband latency
I have two hosts with Voltaire HCA500Ex-D (MT25408 ConnectX Mellanox) 10Gbit cards connected to a Cisco SFS7000d IB switch via CX4 3m cables.
I'm really concerned about latency which is higher than on 1Gbit ethernet connection using same hosts.
[root@localhost ~]# ibping -G 0x0008f104039a5589 Pong from host-a.(none) (Lid 3): time 0.238 ms Pong from host-a.(none) (Lid 3): time 0.291 ms Pong from host-a.(none) (Lid 3): time 0.320 ms Pong from host-a.(none) (Lid 3): time 0.290 ms Pong from host-a.(none) (Lid 3): time 0.335 ms Pong from host-a.(none) (Lid 3): time 0.281 ms
Most people are getting 0.040ms - 0.050ms using the same IB cards.
Cisco IB Switch and cards are running latest firmware.
I've tried connecting two hosts back-to-back eliminating the switch but latency is still 0.200ms+
Any ideas?
Downloading and installing a recent version of the Open Fabrics Enterprise Distribution(OFED) will give you access to a variety of tools, including ibdiagnet
as well as several other IB performance testing and tuning tools. In addition to these tools it will give you the option to install and configure openibd and OpenSM, and open source subnet manager.
Based on the documentation, the
Cisco SFS7000d IB switch
you mentioned is running its own onboard IB subnetmanager. It would be beneficial to either
Disable the subnet manager on the switch and configure OpenSM on one of the servers
Verify that the configuration of the subnet manager on the switch is appropriate for your network
In addition to taking these steps there are a number of other tests that can effectively measure latency and bandwidth over infiniband. Using a Message Passing Interface(MPI) implementation (OpenMPI, or a proprietary version) ping-pong benchmark test. Here is a good example of setting up and configuring an MPI ping-pong test with OpenMPI over IB.