How R formats POSIXct with fractional seconds
One underlying problem is that the POSIXct representation is less precise than the POSIXlt representation, and the POSIXct representation gets converted to the POSIXlt representation before formatting. Below we see that if our string is converted directly to POSIXlt representation, it outputs correctly.
> as.POSIXct('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.2 CDT"
> as.POSIXlt('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.3"
We can also see that by looking at the difference between the binary representation of the two formats and the usual representation of 0.3.
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> as.numeric(t1 - round(unclass(t1))) - 0.3
[1] -4.768372e-08
> t2 <- as.POSIXlt('2011-10-11 07:49:36.3')
> as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3
[1] -2.831069e-15
Interestingly, it looks like both representations are actually less than the usual representation of 0.3, but that the second one is either close enough, or truncates in a way different than I'm imagining here. Given that, I'm not going to worry about floating point representation difficulties; they may still happen, but if we're careful about which representation we use, they will hopefully be minimized.
Robert's desire for rounded output is then simply an output problem, and could be addressed in any number of ways. My suggestion would be something like this:
myformat.POSIXct <- function(x, digits=0) {
x2 <- round(unclass(x), digits)
attributes(x2) <- attributes(x)
x <- as.POSIXlt(x2)
x$sec <- round(x$sec, digits)
format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}
This starts with a POSIXct input, and first rounds to the desired digits; it then converts to POSIXlt and rounds again. The first rounding makes sure that all of the units increase appropriately when we are on a minute/hour/day boundary; the second rounding rounds after converting to the more precise representation.
> options(digits.secs=1)
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"
> t2 <- as.POSIXct('2011-10-11 23:59:59.999')
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"
A final aside: Did you know the standard allows for up to two leap seconds?
> as.POSIXlt('2011-10-11 23:59:60.9')
[1] "2011-10-11 23:59:60.9"
OK, one more thing. The behavior actually changed in May due to a bug filed by the OP (Bug 14579); before that it did round fractional seconds. Unfortunately that meant that sometimes it could round up to a second that wasn't possible; in the bug report, it went up to 60 when it should have rolled over to the next minute. One reason the decision was made to truncate instead of round is that it's printing from the POSIXlt representation, where each unit is stored separately. Thus rolling over to the next minute/hour/etc is more difficult than just a straightforward rounding operation. To round easily, it's necessary to round in POSIXct representation and then convert back, as I suggest.
I've run into this problem, and so started looking for a solution. @Aaron's answer is good, but still breaks for large dates.
Here is code that rounds the seconds properly, according to format
or option("digits.secs")
:
form <- function(x, format = "", tz= "", ...) {
# From format.POSIXct
if (!inherits(x, "POSIXct"))
stop("wrong class")
if (missing(tz) && !is.null(tzone <- attr(x, "tzone")))
tz <- tzone
# Find the number of digits required based on the format string
if (length(format) > 1)
stop("length(format) > 1 not supported")
m <- gregexpr("%OS[[:digit:]]?", format)[[1]]
l <- attr(m, "match.length")
if (l == 4) {
d <- as.integer(substring(format, l+m-1, l+m-1))
} else {
d <- unlist(options("digits.secs"))
if (is.null(d)) {
d <- 0
}
}
secs.since.origin <- unclass(x) # Seconds since origin
secs <- round(secs.since.origin %% 60, d) # Seconds within the minute
mins <- floor(secs.since.origin / 60) # Minutes since origin
# Fix up overflow on seconds
if (secs >= 60) {
secs <- secs - 60
mins <- mins + 1
}
# Represents the prior minute
lt <- as.POSIXlt(60 * mins, tz=tz, origin=ISOdatetime(1970,1,1,0,0,0,tz="GMT"));
lt$sec <- secs + 10^(-d-1) # Add in the seconds, plus a fudge factor.
format.POSIXlt(as.POSIXlt(lt), format, ...)
}
The fudge factor of 10^(-d-1) is from here: Accurately converting from character->POSIXct->character with sub millisecond datetimes by Aaron.
Some examples:
f <- "%Y-%m-%d %H:%M:%OS"
f3 <- "%Y-%m-%d %H:%M:%OS3"
f6 <- "%Y-%m-%d %H:%M:%OS6"
From a nearly identical question:
x <- as.POSIXct("2012-12-14 15:42:04.577895")
> format(x, f6)
[1] "2012-12-14 15:42:04.577894"
> form(x, f6)
[1] "2012-12-14 15:42:04.577895"
> myformat.POSIXct(x, 6)
[1] "2012-12-14 15:42:04.577895"
From above:
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"
> form(t1)
[1] "2011-10-11 07:49:36.3"
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"
> form(t2)
[1] "2011-10-12"
> form(t2, f)
[1] "2011-10-12 00:00:00.0"
The real fun comes in 2038 for some dates. I assume this is because we lose one more bit of precision in the mantissa. Note the value of the seconds field.
> t3 <- as.POSIXct('2038-12-14 15:42:04.577895')
> format(t3)
[1] "2038-12-14 15:42:05.5"
> myformat.POSIXct(t3, 1)
[1] "2038-12-14 15:42:05.6"
> form(t3)
[1] "2038-12-14 15:42:04.6"
This code seems to work for other edge cases that I've tried. The common thing between format.POSIXct
and myformat.POSIXct
in Aaron's answer is the conversion to from POSIXct
to POSIXlt
with the seconds field intact.
This points to a bug in that conversion. I'm not using any data that isn't available to as.POSIXlt()
.
Update
The bug is in src/main/datetime.c:434
in the static function localtime0
, but I am not sure yet of the correct fix:
Lines 433-434:
day = (int) floor(d/86400.0);
left = (int) (d - day * 86400.0 + 0.5);
The extra 0.5
for rounding the value is the culprit. Note that the subsecond value of t3
above exceeds .5. localtime0
deals with seconds only, and the subseconds are added in after localtime0
returns.
localtime0
returns correct results if the double presented is an integer value.