Network latency decreasing on load
I'm doing an experiment to see the effects of CPU load on network latency. My results makes no sense, and I'm hoping you may be able to understand it. The machines are connected through two network links.
First link:
[ Host A ] < -- InfiniBand (200 Gbps) -- > [ Host B ]
Second link:
[ Host A ] < -- Ethernet --> [ Switch ] <-- Ethernet -- > [ Host B ]
Above is an idle Host A and Host B pinging host A via Ethernet and InfiniBand. The Ethernet median latency is 0.550ms and the InfiniBand median latency is 0.330ms.
Above is a stressed Host A and Host B pinging host A via Ethernet and InfiniBand. The Ethernet median latency is now 0.360ms and the InfiniBand median latency is 0.115ms.
Both hosts run Ubuntu 20.04 and Linux 5.8.
Why did my network latency decrease when I stressed all cores of Host A?
This is completely normal. Let me explain a little bit more detail. An idle kernel (no process with work to do) puts the CPU in power saving mode, and possibly decreases the CPU's core frequency.
Whenever a packet arrives on the network card, the network card raises an interrupt which gives a signal to the kernel "a network packet has arrived". This needs to be processed until the kernel network stack decides to send a ping reply. Now when it comes to response time, the default linux kernel scheduler is not optimized for this, because usually, it doesn't matter if the reply arrives in 0.3 ms or in 0.1 ms.
Now consider this: an active CPU is on maximized CPU frequency, and does not need to be waken up from power save mode... an active CPU responds quicker to ping request than an idle CPU. (Unless you boot a real-time kernel, which is optimized for response time).