Latency of memory accesses via interconnectors

Your single node numbers of 90 ns local vs 370 ns other socket seems reasonable. However, I think the 600 ns of Infiniband is supposed to be end to end, through a switch to a different frame.

600 ns for a remote datagram is very fast. Local memory access is usually on the order of 100 ns. And same node different socket might be 200 ns more.

Single image multiple node computers have memory access by either RDMA in software, or through hardware interconnects in a NUMA system.

InfiniBand is one transport for RDMA. Circa 2014 Mellanox claimed 500 ns end to end for Infiband EDR. Guessing here, but their marketing could be mixing numbers. 600 ns typical end to end quoted on the NICs, plus 150 ns per extra switch on the path.

Or, yes NUMA interconnects for multiple node systems are a specialized thing, but they do exist. For x86, there was SGI UV family. NUMAlink 7 interconnect claimed 500 ns remote node access. On POWER platform, IBM can wire up nodes with NVLink, although I don't know the latency of that.

Regarding your selection of commodity transport of Ethernet or Infiniband, likely that limits you to RDMA aware applications. NUMA hardware to support transparent single image systems tends to be custom.

Latency of memory accesses via interconnectors

Related

Recent Posts