How do I determine the current state of an NTP server on Linux?

I have a rather simple setup where I have two computers:

Computer A. has a normal NTP setup and uses the standard Internet sources (As per Ubuntu) to determine time. It also allows for query on IP 10.0.2.0/24:

restrict 10.0.2.0 mask 255.255.255.0 nomodify notrap

Computer B. has a normal NTP setup, except for all the sources are changed to use 10.0.2.1 (which is Computer A).

Once in a while, Computer A gets a Kiss-of-Death signal from one of its source. As a result, it totally kills Computer B's NTP (i.e. it looks like the KoD is transmitted directly).

Is there a way to know the state of an NTP server in terms of whether it will just be sending KoD message or not? (also, how do I get out of that situation? When I looked at it, all the IP addresses shown in the log were not used by the server?! so I don't understand why it insists on sending KoD to its client).


I've found two things so far:

  1. ntpq

    I can run ntpq like so:

     ntpq -pn
    

    When the NTP server works, I can see an asterisk in front of the IP address the computer is happy about. In my case, all of the status flags (first column +, -, *, #, etc.) all disappear. As far as I know, that means the NTP service is not happy and no synchronization is being performed.

    Here is an example when it still works (i.e. there are flags in the very first column):

          remote           refid      st t when poll reach   delay   offset  jitter
     ==============================================================================
      10.0.2.255      .BCST.          16 B    -   64    0    0.000    0.000   0.000
     #51.77.203.211   134.59.1.5       3 u    4   64    1  171.248  -743.64 691.917
     +72.5.72.15      216.218.254.202  2 u    2   64    1   19.223  -778.34 686.200
     +159.69.25.180   192.53.103.103   2 u    3   64    1  237.733  -775.41 701.376
     +173.0.48.220    43.77.130.254    2 u    2   64    1   35.489  -778.85 669.187
      38.229.56.9     172.16.21.35     2 u   31   64    1  153.976  -268.90 122.557
     +137.190.2.4     .PPS.            1 u   31   64    1   93.797  -253.69 116.289
     +150.136.0.232   185.125.206.71   3 u   35   64    1   95.667  -178.19 114.912
      94.154.96.7     194.29.130.252   2 u   31   64    1  237.560  -231.88 107.230
     +162.159.200.123 10.4.1.175       3 u   34   64    1   16.246  -199.68 115.561
     *216.218.254.202 .CDMA.           1 u   35   64    1   52.906  -193.84 131.148
      91.189.91.157   132.163.96.1     2 u   45   64    1   87.772   -5.716   0.000
     +204.2.134.163   44.24.199.34     3 u   34   64    1   16.711  -199.12 116.777
     +74.6.168.73     208.71.46.33     2 u   35   64    1   69.772  -189.21 128.119
      91.189.89.199   17.253.34.123    2 u   45   64    1  165.471   -3.708   0.000
     +216.229.0.49    216.218.192.202  2 u   35   64    1   71.437  -178.94  97.505
      91.189.89.198   17.253.34.123    2 u   44   64    1  172.852  -17.899   0.000
    
  2. ntpdate -q <ip>

    The ntpdate command will actually tell me whether the NTP is accepting packets. This is because it gives an error message if not:

     $ sudo ntpdate -q 10.0.2.1
     server 10.0.2.1, stratum 4, offset 5.194725, delay 0.02652
     21 Jul 15:22:48 ntpdate[13086]: no server suitable for synchronization found
    

    This happens after a little while when my main server loses the * status on the one server it was first happy to synchronize with...

Now... I still need to understand what I have to do to fix this issue...


This may be helpful, here are the logs about a restart from a full reboot:

Jul 21 18:29:13 vm-ve-ctl kernel: [  434.275481] audit: type=1400 audit(1626917353.636:43): apparmor="DENIED" operation="open" profile="/usr/sbin/ntp
d" name="/snap/bin/" pid=3896 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Jul 21 18:29:13 vm-ve-ctl ntpd[3896]: ntpd [email protected] (1): Starting
Jul 21 18:29:13 vm-ve-ctl ntpd[3896]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 126:129
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: proto: precision = 0.190 usec (-22)
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Cannot open logfile /var/log/ntp.log: Permission denied
Jul 21 18:29:13 vm-ve-ctl kernel: [  434.291490] audit: type=1400 audit(1626917353.652:44): apparmor="DENIED" operation="capable" profile="/usr/sbin/
ntpd" pid=3901 comm="ntpd" capability=1  capname="dac_override"
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): good hash signature
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): loaded, expire=2021-12-28T00:00:00Z last=2017-01-01T
00:00:00Z ofs=37
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen and drop on 0 v6wildcard [::]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 2 lo 127.0.0.1:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 3 enp0s3 192.168.2.120:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 4 enp0s8 10.0.2.1:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 5 lo [::1]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 6 enp0s3 [fe80::a00:27ff:fe25:38ff%2]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 7 enp0s8 [fe80::a00:27ff:fe35:c30b%3]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listening on routing socket on fd #24 for interface updates
Jul 21 18:29:14 vm-ve-ctl ntpd[3901]: Soliciting pool server 51.77.203.211
Jul 21 18:29:15 vm-ve-ctl ntpd[3901]: Soliciting pool server 159.69.25.180
Jul 21 18:29:15 vm-ve-ctl ntpd[3901]: Soliciting pool server 72.5.72.15
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 198.251.86.68
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 173.0.48.220
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 38.229.56.9
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 150.136.0.232
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 94.154.96.7
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 137.190.2.4
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 162.159.200.123
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 216.218.254.202
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.91.157
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.89.199
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 74.6.168.73
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 204.2.134.163
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.89.198
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 216.229.0.49
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 2604:ed40:1000:1711:d862:f5ff:fe4e:41c4
Jul 21 18:29:21 vm-ve-ctl ntpd[3901]: receive: Unexpected origin timestamp 0xe4a34871.ac57f05d does not match aorg 0000000000.00000000 from [email protected] xmt 0xe4a34871.65648c54

I do not know exactly when it starts going bad. I've also seen the following which I thought could have something to do with it (i.e. when that happens, the corresponding IP is removed from the list!), but it is already bad now and no such error occurred in my last run.

Jul 21 18:08:57 vm-ve-ctl ntpd[9764]: 92.243.6.5 local addr 192.168.2.120 -> <null>

Note: the 192.168.2.120 is the IP of the failing computer. It is a VirtualBox. It has been working for months... though, maybe something changed which makes it unhappy.

I found this post about an issue with the ... -> <null> message. However, I think we have a newer version on Ubuntu 18.04:

SUSE minimum recommended version: ntp-4.2.8p7-11.1
Ubuntu 18.04 version: 1:4.2.8p10+dfsg-5ubuntu7.3


Just in case, I tried to connect the VM to the host and I still get a huge offset and jitter. What could have changed?!

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.0.2.10       .POOL.          16 p    -   64    0    0.000    0.000   0.000
 10.0.2.10       132.163.97.6     2 u   54   64    3    0.457  -5254.2 3917.68

As asked by Paul Gear, here is the output of ntpq with additional details:

$ ntpq -ncrv
associd=0 status=0028 leap_none, sync_unspec, 2 events, no_sys_peer,
version="ntpd [email protected] (1)", processor="x86_64",
system="Linux/4.15.0-151-generic", leap=00, stratum=4, precision=-23,
rootdelay=17.930, rootdisp=5019.260, refid=173.255.215.209,
reftime=e4a44f7a.1c2ec778  Thu, Jul 22 2021 13:11:38.110,
clock=e4a45030.c8a4b259  Thu, Jul 22 2021 13:14:40.783, peer=0, tc=6,
mintc=3, offset=-109.527915, frequency=-1.707, sys_jitter=0.000000,
clk_jitter=38.724, clk_wander=0.000, tai=37, leapsec=201701010000,
expire=202112280000

Here is the list of available clocks and the one currently used:

$ grep . /sys/devices/system/clocksource/clocksource*/[ac]*clocksource
/sys/devices/system/clocksource/clocksource0/available_clocksource:kvm-clock tsc acpi_pm 
/sys/devices/system/clocksource/clocksource0/current_clocksource:kvm-clock

And finally, the dmesg output about the clocksource selection process:

$ dmesg | grep clocksource
[    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.283117] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    1.161844] clocksource: Switched to clocksource kvm-clock
[    1.208316] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    2.329228] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1db81a3240f, max_idle_ns: 440795250379 ns

Solution 1:

It looks like I found a solution. I'm not too sure why it would have been working before, though.

After many searches, I found this VirtualBox ticket:

https://www.virtualbox.org/ticket/15179

which says that you shouldn't use ntpd because the VM already does keep time using the Host time to adjust the VM's virtual clock.

However, even withouth ntpd running, my VM's clock was off and would bounce between ±30 seconds in a short period of time.

Reading that post further, they offered to adjust the paravirtualization settings to "None". That didn't seem to work for me. I restarted the VM and it got stuck. So I instead tried with "Minimal" and now the clock is working. The ntpq -np output looks a lot better:

Thu Jul 22 12:57:57 PDT 2021
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 1.ubuntu.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.008
 2.ubuntu.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.008
 3.ubuntu.pool.n .POOL.          16 p    -   64    0    0.000    0.000   0.008
 ntp.ubuntu.com  .POOL.          16 p    -   64    0    0.000    0.000   0.008
+23.157.160.168  129.6.15.28      2 u   20  128  377   83.831    0.783   1.745
-104.131.139.195 163.237.218.19   2 u   28  128  377   20.162    3.622   1.451
+144.76.64.40    36.224.68.195    2 u   18  128  377  179.329    2.557   6.671
-159.89.86.140   192.5.41.40      2 u  126  128  377   87.016    3.094   1.385
-23.175.208.10   239.9.71.195     2 u   35  128  377   82.350    2.311   0.794
+206.82.16.3     47.187.174.51    2 u  127  128  377   84.683    1.385   0.753
+92.243.6.5      85.199.214.98    2 u   25  128  377  163.041    4.275   4.025
*200.160.7.186   .ONBR.           1 u   47  128  377  191.078    1.126   1.865
+51.255.197.148  217.91.44.17     2 u   96  128  367  154.201    1.225   4.742

As we can see, the Offset (max. 4.3) and Jitter (max. 6.7) are minuscule compare to what I was getting before. Now my other computer also works and can synchronize itself to this clock. The delay is around 0.7, which is enough for me (in my case, enough is 12.0 or less).

IMPORTANT NOTE: I rebooted that VM 2 or 3 times using Minimal, the boot time is excruciatingly long. So I do not recommend using this mode except as a the very last resort if you absolutely need a working system clock. If you have a better solution that works, I'm all ears!


Just in case, I wanted to see the difference in regard to the clock sources in Minimal mode. We just get the acpi_pm clock.

$ grep . /sys/devices/system/clocksource/clocksource*/[ac]*clocksource
/sys/devices/system/clocksource/clocksource0/available_clocksource:acpi_pm 
/sys/devices/system/clocksource/clocksource0/current_clocksource:acpi_pm

Now I'm wondering whether this could have an effect on the host clock. Since I also have ntp on my host, I don't think this is an issue.