Calculate system time using rdtsc

Suppose all the cores in my CPU have same frequency, technically I can synchronize system time and time stamp counter pairs for each core every millisecond or so. Then based on the current core I'm running with, I can take the current rdtsc value and using the tick delta divided by the core frequency I'm able to estimate the time passed since I last synchronized the system time and time stamp counter pair and to deduce the current system time without the overhead of system call from my current thread (assuming no locks are needed to retrieve the above data). This works great in theory but in practice I found that sometimes I get more ticks then I would expect, that is, if my core frequency is 1GHz and I took system time and time stamp counter pair 1 millisecond ago I would expect to see a delta in the ticks which is around 10^6 ticks, but actually I found it can be anywhere between 10^6 and 10^7. I'm not sure what is wrong, can anyone share his thoughts on how to calculate system time using rdtsc? My main objective is to avoid the need to perform system call every time I want to know the system time and be able to perform a calculation in user space that will give my a good estimation of it (currently I define a good estimation as a result which is with in 10 micro seconds interval from the real system time.


Solution 1:

The idea is not unsound but it is not suited for user-mode applications, for which, as @Basile suggested, there are better alternatives.

Intel itself suggests to use the TSC as a wall-clock:

The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states.
This is the architectural behaviour moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.

However, care must be taken.

The TSC is not always invariant

In older processors the TSC is incremented on every internal clock cycle, it was not a wall-clock.
Quoting Intel

For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle.

The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel® SpeedStep® technology transitions may also impact the processor clock.

If you only have a variant TSC, the measurement are unreliable for tracking time. There is hope for invariant TSC though.

The TSC is not incremented at the frequency advised on the brand string

Still quoting Intel

the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the processor base frequency.
On certain processors, the TSC frequency may not be the same as the frequency in the brand string.

You can't simply take the frequency written on the box of the processor.
See below.

rdtsc is not serialising

You need to serialise it from above and below.
See this.

The TSC is based on the ART (Always Running Timer) when invariant

The correct formula is

TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K

See section 17.15.4 of the Intel manual 3.

Of course, you have to solve for ART_Value since you start from a TSC_Value. You can ignore the K as you are interested in deltas only. From the ART_Value delta you can get the time elapsed once you know the frequency of the ART. This is given as k * B where k is a constant in the MSR MSR_PLATFORM_INFO and B is 100Mhz or 133+1/3 Mhz depending on the processor.

As @BeeOnRope pointed out, from Skylake the ART crystal frequency is no longer the bus frequency.
The actual values, maintained by Intel, can be found in the turbostat.c file.

switch(model) 
{
case INTEL_FAM6_SKYLAKE_MOBILE: /* SKL */
case INTEL_FAM6_SKYLAKE_DESKTOP:    /* SKL */
case INTEL_FAM6_KABYLAKE_MOBILE:    /* KBL */
case INTEL_FAM6_KABYLAKE_DESKTOP:   /* KBL */
    crystal_hz = 24000000;  /* 24.0 MHz */
    break;
case INTEL_FAM6_SKYLAKE_X:  /* SKX */
case INTEL_FAM6_ATOM_DENVERTON: /* DNV */
    crystal_hz = 25000000;  /* 25.0 MHz */
    break;
case INTEL_FAM6_ATOM_GOLDMONT:  /* BXT */
    crystal_hz = 19200000;  /* 19.2 MHz */
    break;
default:
    crystal_hz = 0; 
}

The TSC is not incremented when the processor enter a deep sleep

This should not be a problem on single socket machines but the Linux kernel has some comment about the TSC being reset even on non-deep sleep states.

The context switches will poison the measurements

There nothing you can do about it.
This actually prevents you from time-keeping with the TSC.

Solution 2:

Don't do that -using yourself directly the RDTSC machine instruction- (because your OS scheduler could reschedule other threads or processes at arbitrary moments, or slow down the clock). Use a function provided by your library or OS.

My main objective is to avoid the need to perform system call every time I want to know the system time

On Linux, read time(7) then use clock_gettime(2) which is really quick (and does not involve any slow system call) thanks to vdso(7).

On a C++11 compliant implementation, simply use the standard <chrono> header. And standard C has clock(3) (giving microsecond precision). Both would use on Linux good enough time measurement functions (so indirectly vdso)

Last time I measured clock_gettime it often took less than 4 nanoseconds per call.