Why should I care about NTP Kernel statistics?
I'ved noticed that munin graphs a few bits of information about timing/kernel statistics that I've never quite understood. Most of my servers seem to stay close to 0, which I presume is good, but one of them is slowly but steadily getting more and more negative on one of the graphs.
Munin graphs the following statistics over time:
- NTP kernel PLL estimated error (secs)
- NTP kernel PLL frequency (ppm + 0)
- NTP kernel PLL offset (secs)
- NTP timing statistics for system peer
Here's an example from munin's docs: http://demo.munin-monitoring.org/time-year.html
Searching around the web for a concise, understandable definition of NTP turns up nothing (except for a bunch of Nagios and Munin graphs), and searching Server Fault turns up a ton of answers that presume the reader knows something about NTP already.
Stack Overflow defines it thusly:
NTP stands for Network Time Protocol, and it is an Internet protocol used to synchronize the clocks of computers to some time reference.
But that seems a little obtuse—does this affect, say, a web server, encryption, or database synchronization?
What is NTP, and why should I care? Are there any stats in particular I should make sure don't get out of control?
NTP is a protocol that synchronizes the system clock (usually there is a daemon running on *nix boxes). In short, it makes sure that the time on the server is correct. There are many reasons it is important to have accurate time:
- Some authentication schemes (such as kerberos, AD auth) count on the system having correct time
- When you troubleshoot things, having accurate time stamps in the logs can be vital
- Many applications that run on a server might use the system time to generation information they show to the user. Depending on the application, time can be critical (for example, knowing when a financial transaction happened)
I'm sure there are others, but having accurate system time is a standard responsibility of a system administrator. NTP does a lot of sophisticated things to this end (accounting and correcting for drifts etc). So those details statistics can help you troubleshoot any issues that arrise in fulfilling this role.