Infiniband drivers : OFED or distro included?

I'm setting up a Linux cluster with infiniband network, and I'm quite a newby in infiniband wolrd, any advice is more than welcome !

We are currently using Mellanox OFED drivers, but our infiniband cards are old and not recognized by the latest MOFED drivers. So I'm wondering why not to use distribution shipped drivers (running CentOS7).

What difference will that make to use one or another ? Should I expect any performance decrease ?

thx


Solution 1:

By not using the vendor OFED distribution, in this case Mellanox OFED you should expect not only a performance penalty but lack of features and a lot of stability issues.

Infiniband is not rock solid as Ethernet is, the main goal of Infiniband is to provide a low latency fabric, not only a high throughput network as everybody usually think.

The inbox driver (that's how Mellanox calls the OFED distribution shipped on the distribution) is unreliable at best, and if you're running cards older than Connect-X4 you'll have a bad time when running IPoIB if needed, just keeping it enabled will eventually lead to kernel panics. Performance is just bad and the network is unreliable.

There are some alternatives, first of all there's the MLNX OFED 4.9 which is an LTS release that support older cards like the Connect-X3. I would stick with it since it's supported and will be supported for a long time.

The difference is the support for the following hardware and technology:

  • ConnectX-3 Pro
  • ConnectX-3
  • Connect-IB
  • RDMA experimental verbs library (mlnx_lib)

Download it from here: https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed

If the LTS version of Mellanox OFED does not fit you, another solution is moving to Oracle Linux, adopt UEK (Unbreakable Enterprise Kernel) and consume its RDMA distribution. At least Oracle test this OFED release, their Exadata product uses it. There's documentation available here: https://docs.oracle.com/en/operating-systems/uek/6/relnotes6.2/ol_instav.html#uek6_install_rdma