Linux kernel detecting wrong processor frequency
I convinced myself that the problem was a misidentified time stamp counter (TSC) frequency.
Apparently the kernel is calibrating the TSC against the programmable interval timer (PIT). Usually the identified CPU frequency is 2400.204 ± 0.134 MHz, which corresponds to about 56 ppm accuracy. After the problematic boot the CPU freq was estimated as 2383.579 MHz, which corresponds to an error of about 6900 ppm, which ntpd
was not able to compensate for. In fact during the first 10h30m of functioning the system clock gained about 4m30s, which is about 7000 ppm.
Since the error in the TSC frequency corresponds to the drift in the system clock I would conclude that the abnormal clock behaviour was caused by a wrong TSC calibration.
However I never saw such a big problem: I'm still wondering about the possible causes (hw, sw?) of this wrong calibration.
This type of behavior is atypical. A good check would be to monitor the values of the ntp.drift
file to see if significant changes happen when the behavior was showing up. If it kept changing significantly, NTP was attempting to skew around a problem. If that was the case, it's a sign that the kernel misidentified the true clock frequency on startup, or the clock itself was slow for the wrong parts of boot. Unfortunately, this one event isn't a clear signal of hardware problems.
If it happens again, watch that ntp.drift file.