How to fix time on NTP server with a lot of machines synchronized by it

I have one NTP server which has a wrong time setting which are 7 hours in the future (timezone was changed after machine shipment, but not the time). The server itself is not synchronized, but only has its local clock. On this server >10 clients synchronize their clock which leads to a whole group of servers with a wrong time.

How can I change the time on the NTP server that the correction is slewed and all clients will get corrected, too? I first tested with just a fix via "date MMDDhhmm" which let to the clients to disconnect from server (the asterisk in front of the server name in ntpq disappeared).

I do not know how all the synchronized services will behave when I change the time on all servers manually by setting the clock back 7 hours leading the systems to have files from the future. There may be crashes and the systems provide services for a fab production.


Solution 1:

When you talk about slewing the time, you are usually talking about small amounts of time. The fix is performed with a call to adjtime(), or on linux maybe adjtimex().

From the ntpd man page:

   -x     Normally, the time is slewed if the offset is less than the step
          threshold,  which is 128 ms by default, and stepped if above the
          threshold.  This option sets the threshold to 600  s,  which  is
          well  within  the  accuracy  window  to  set the clock manually.
          Note: Since the slew rate of typical Unix kernels is limited  to
          0.5  ms/s,  each  second  of adjustment requires an amortization
          interval of 2000 s.  Thus, an adjustment as much as 600  s  will
          take  almost  14 days to complete.  This option can be used with
          the -g and -q options.  Note: The kernel time discipline is dis‐
          abled with this option.

I doubt then that you are going to want to wait for a 7 hour correction to happen at this speed. It'd take over a year. On linux adjtime on a 32 bit system is effectively constrained to a delta of about 2000 seconds. 64 bit systems probably make that a non issue, but the speed at which the change would take effect is still a concern.

So there's a threshold in the linux implementation, and presumably others, under which you get a 'slew' which is very slow, but above this the system clocks on master and clients will be stepped, which can proceed much faster.

There will also be another threshold where if the time difference between master and client is too large, the client will assume an error and not update. From the ntpd man page:

   -g     Normally, ntpd exits with a message to the  system  log  if  the
          offset  exceeds the panic threshold, which is 1000 s by default.
          This option allows the time to  be  set  to  any  value  without
          restriction; however, this can happen only once.  If the thresh‐
          old is exceeded after that, ntpd will exit with a message to the
          system log.  This option can be used with the -q and -x options.

Note that the -g option is almost certainly not set for a daemon. It's usually used as ntpd -gq, run as a one-off at system start-up, or manually which behaves much like ntpdate. The panic threshold is presumably configurable at compile time though, so check the man page for your OS vendor(s).

It is pretty straight-forward to write a program which will make a series of time adjustments using any frequency and size of adjustment you choose. You can do this on the ntp master, and it will serve the adjusted time to its clients, but you need to know what maximum size adjustment the client systems will accept, and what minimum threshold will cause them to perform a very slow slew. To be safe, You should survey the ntp implementations on the client systems.

If you are updating systems with characteristics similar to default ntpd on linux without the -x option, then you could use a regime like making a half second adjustment every 5 seconds, and you'd get into sync over the course of about 3 days. Making sub-second adjustments that do not cross a second boundary might help to avoid things like triggering cron jobs twice, but expect that you'll probably find some sort of side effects.

If you wind up in a situation where your servers are no longer all in sync with each other, then it gets messier. If feasible, I'd want to monitor the time differences, and automatically stop doing the automated periodic updates if some servers are no longer following along, and raise an alert.