How to troubleshoot latency between 2 linux hosts

The latency between 2 linux hosts is about .23ms. They are connected by one switch. Ping & Wireshark confirm the latency number. But, i dont have any visibility into what is causing this latency. How can i know if the latency is due to NIC on host A or B or the switch or the cables?

UPDATE: The .23 ms latency is bad for my existing application, which sends messages at very high frequency and i am trying to see if it can be brought down to .1ms


Solution 1:

Generically, you can use some of the advanced switches to the iperf utility to get a view of the network performance between systems, specifically latency and jitter...

Is this a UDP or TCP-based message stream?

I commented above on needing more information about your setup. If this is a low latency messaging application, there's a whole world of tuning and optimization techniques that span hardware, driver and OS tweaking. But really, we need more information.

Edit:

Okay, so this is TCP messaging. Have you modified any /etc/sysctl.conf parameters? What do your send/receive buffers look like? Using a realtime kernel alone won't do much, but if you move to the point where you're binding interrupts to CPU's, changing the realtime priority of the messaging app (chrt) and possibly modifying the tuned-adm profile of the system may help...

This sounds to be a generic EL6 system, so an easy way to set a performance tuning baseline involves changing the system's performance profile to another one available within the tuned framework. Then build from there.

In your case:

yum install tuned tuned-utils
tuned-adm profile latency-performance

A quick matrix showing the differences:

Can you tell us about the hardware? Types of CPU, NIC, memory?

So, it may be interesting to test your link... Try this iperf test...

On one system, start an iperf UDP listener. On the other, open a connection to the first... A quick line-quality test.

# Server2
[root@server2 ~]# iperf -su   

# Server1
[root@server1 ~]# iperf -t 60 -u -c server2

In my case, low jitter and low ping time:

------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 192.168.15.3 port 5001 connected with 172.16.2.152 port 36312
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0-20.0 sec  2.50 MBytes  1.05 Mbits/sec   0.012 ms    0/ 1785 (0%)

PING server1 (172.16.2.152) 56(84) bytes of data.
64 bytes from server1 (172.16.2.152): icmp_seq=1 ttl=63 time=0.158 ms
64 bytes from server1 (172.16.2.152): icmp_seq=2 ttl=63 time=0.144 ms

I'd check the hardware and interfaces for errors. If you want, eliminate the switch between systems and see what a direct connection looks like. You don't want high jitter (variance), so check that.

But honestly, even with the ping times you're getting on your current setup, that should not be enough to kill your application. I'd go down the path of tuning your send/receive buffers. See: net.core.rmem_max, net.core.wmem_max and their defaults...

Something like the following in /etc/sysctl.conf (please tune to taste):

net.core.rmem_default = 10000000
net.core.wmem_default = 10000000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216