Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt
I have followed a number of questions here that asks about how to convert character vectors to datetime classes. I often see 2 methods, the strptime and the as.POSIXct/as.POSIXlt methods. I looked at the 2 functions but am unclear what the difference is.
strptime
function (x, format, tz = "")
{
y <- .Internal(strptime(as.character(x), format, tz))
names(y$year) <- names(x)
y
}
<bytecode: 0x045fcea8>
<environment: namespace:base>
as.POSIXct
function (x, tz = "", ...)
UseMethod("as.POSIXct")
<bytecode: 0x069efeb8>
<environment: namespace:base>
as.POSIXlt
function (x, tz = "", ...)
UseMethod("as.POSIXlt")
<bytecode: 0x03ac029c>
<environment: namespace:base>
Doing a microbenchmark to see if there are performance differences:
library(microbenchmark)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 5000, replace = TRUE)
df <- microbenchmark(strptime(Dates, "%d-%m-%Y"), as.POSIXlt(Dates, format = "%d-%m-%Y"), times = 1000)
Unit: milliseconds
expr min lq median uq max
1 as.POSIXlt(Dates, format = "%d-%m-%Y") 32.38596 33.81324 34.78487 35.52183 61.80171
2 strptime(Dates, "%d-%m-%Y") 31.73224 33.22964 34.20407 34.88167 52.12422
strptime seems slightly faster. so what gives? why would there be 2 similar functions or are there differences between them that I missed?
Well, the functions do different things.
First, there are two internal implementations of date/time: POSIXct
, which stores seconds since UNIX epoch (+some other data), and POSIXlt
, which stores a list of day, month, year, hour, minute, second, etc.
strptime
is a function to directly convert character vectors (of a variety of formats) to POSIXlt
format.
as.POSIXlt
converts a variety of data types to POSIXlt
. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime
.
as.POSIXct
converts a variety of data types to POSIXct
. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime
first, then does the conversion from POSIXlt
to POSIXct
.
It makes sense that strptime
is faster, because strptime
only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.
There are two POSIXt types, POSIXct and POSIXlt. "ct" can stand for calendar time, it stores the number of seconds since the origin. "lt", or local time, keeps the date as a list of time attributes (such as "hour" and "mon"). Try these examples:
date.hour=strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")
date=c("26/10/2016")
time=c("19:51:30")
day<-paste(date,"T", time)
day.time1=as.POSIXct(day,format="%d/%m/%Y T %H:%M:%S",tz="Europe/Paris")
day.time1
day.time1$year
day.time2=as.POSIXlt(day,format="%d/%m/%Y T %H:%M:%S",tz="Europe/Paris")
day.time2
day.time2$year