How R formats POSIXct with fractional seconds

r posixct

One underlying problem is that the POSIXct representation is less precise than the POSIXlt representation, and the POSIXct representation gets converted to the POSIXlt representation before formatting. Below we see that if our string is converted directly to POSIXlt representation, it outputs correctly.

> as.POSIXct('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.2 CDT"
> as.POSIXlt('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.3"

We can also see that by looking at the difference between the binary representation of the two formats and the usual representation of 0.3.

> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> as.numeric(t1 - round(unclass(t1))) - 0.3
[1] -4.768372e-08

> t2 <- as.POSIXlt('2011-10-11 07:49:36.3')
> as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3
[1] -2.831069e-15

Interestingly, it looks like both representations are actually less than the usual representation of 0.3, but that the second one is either close enough, or truncates in a way different than I'm imagining here. Given that, I'm not going to worry about floating point representation difficulties; they may still happen, but if we're careful about which representation we use, they will hopefully be minimized.

Robert's desire for rounded output is then simply an output problem, and could be addressed in any number of ways. My suggestion would be something like this:

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}

This starts with a POSIXct input, and first rounds to the desired digits; it then converts to POSIXlt and rounds again. The first rounding makes sure that all of the units increase appropriately when we are on a minute/hour/day boundary; the second rounding rounds after converting to the more precise representation.

> options(digits.secs=1)
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"

> t2 <- as.POSIXct('2011-10-11 23:59:59.999')
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

A final aside: Did you know the standard allows for up to two leap seconds?

> as.POSIXlt('2011-10-11 23:59:60.9')
[1] "2011-10-11 23:59:60.9"

OK, one more thing. The behavior actually changed in May due to a bug filed by the OP (Bug 14579); before that it did round fractional seconds. Unfortunately that meant that sometimes it could round up to a second that wasn't possible; in the bug report, it went up to 60 when it should have rolled over to the next minute. One reason the decision was made to truncate instead of round is that it's printing from the POSIXlt representation, where each unit is stored separately. Thus rolling over to the next minute/hour/etc is more difficult than just a straightforward rounding operation. To round easily, it's necessary to round in POSIXct representation and then convert back, as I suggest.

I've run into this problem, and so started looking for a solution. @Aaron's answer is good, but still breaks for large dates.

Here is code that rounds the seconds properly, according to format or option("digits.secs"):

form <- function(x, format = "", tz= "", ...) {
  # From format.POSIXct
  if (!inherits(x, "POSIXct")) 
    stop("wrong class")
  if (missing(tz) && !is.null(tzone <- attr(x, "tzone"))) 
    tz <- tzone

  # Find the number of digits required based on the format string
  if (length(format) > 1)
    stop("length(format) > 1 not supported")

  m <- gregexpr("%OS[[:digit:]]?", format)[[1]]
  l <- attr(m, "match.length")
  if (l == 4) {
    d <- as.integer(substring(format, l+m-1, l+m-1))
  } else {
    d <- unlist(options("digits.secs"))
    if (is.null(d)) {
      d <- 0
    }
  }  


  secs.since.origin <- unclass(x)            # Seconds since origin
  secs <- round(secs.since.origin %% 60, d)  # Seconds within the minute
  mins <- floor(secs.since.origin / 60)      # Minutes since origin
  # Fix up overflow on seconds
  if (secs >= 60) {
    secs <- secs - 60
    mins <- mins + 1
  }

  # Represents the prior minute
  lt <- as.POSIXlt(60 * mins, tz=tz, origin=ISOdatetime(1970,1,1,0,0,0,tz="GMT"));
  lt$sec <- secs + 10^(-d-1)  # Add in the seconds, plus a fudge factor.
  format.POSIXlt(as.POSIXlt(lt), format, ...)
}

The fudge factor of 10^(-d-1) is from here: Accurately converting from character->POSIXct->character with sub millisecond datetimes by Aaron.

Some examples:

f  <- "%Y-%m-%d %H:%M:%OS"
f3 <- "%Y-%m-%d %H:%M:%OS3"
f6 <- "%Y-%m-%d %H:%M:%OS6"

From a nearly identical question:

x <- as.POSIXct("2012-12-14 15:42:04.577895")

> format(x, f6)
[1] "2012-12-14 15:42:04.577894"
> form(x, f6)
[1] "2012-12-14 15:42:04.577895"
> myformat.POSIXct(x, 6)
[1] "2012-12-14 15:42:04.577895"

From above:

> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"
> form(t1)
[1] "2011-10-11 07:49:36.3"

> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

> form(t2)
[1] "2011-10-12"
> form(t2, f)
[1] "2011-10-12 00:00:00.0"

The real fun comes in 2038 for some dates. I assume this is because we lose one more bit of precision in the mantissa. Note the value of the seconds field.

> t3 <- as.POSIXct('2038-12-14 15:42:04.577895')
> format(t3)
[1] "2038-12-14 15:42:05.5"
> myformat.POSIXct(t3, 1)
[1] "2038-12-14 15:42:05.6"
> form(t3)
[1] "2038-12-14 15:42:04.6"

This code seems to work for other edge cases that I've tried. The common thing between format.POSIXct and myformat.POSIXct in Aaron's answer is the conversion to from POSIXct to POSIXlt with the seconds field intact.

This points to a bug in that conversion. I'm not using any data that isn't available to as.POSIXlt().

Update

The bug is in src/main/datetime.c:434 in the static function localtime0, but I am not sure yet of the correct fix:

Lines 433-434:

day = (int) floor(d/86400.0);
left = (int) (d - day * 86400.0 + 0.5);

The extra 0.5 for rounding the value is the culprit. Note that the subsecond value of t3 above exceeds .5. localtime0 deals with seconds only, and the subseconds are added in after localtime0 returns.

localtime0 returns correct results if the double presented is an integer value.

How R formats POSIXct with fractional seconds

Related

Recent Posts