Convert date-time string to class Date

I have a data frame with a character column of date-times.

When I use as.Date, most of my strings are parsed correctly, except for a few instances. The example below will hopefully show you what is going on.

# my attempt to parse the string to Date -- uses the stringr package
prods.all$Date2 <- as.Date(str_sub(prods.all$Date, 1, 
                str_locate(prods.all$Date, " ")[1]-1), 
                "%m/%d/%Y")

# grab two rows to highlight my issue
temp <- prods.all[c(1925:1926), c(1,8)]
temp
#                    Date      Date2
# 1925  10/9/2009 0:00:00 2009-10-09
# 1926 10/15/2009 0:00:00 0200-10-15

As you can see, the year of some of the dates is inaccurate. The pattern seems to occur when the day is double digit.

Any help you can provide will be greatly appreciated.


Solution 1:

The easiest way is to use lubridate:

library(lubridate)
prods.all$Date2 <- mdy(prods.all$Date2)

This function automatically returns objects of class POSIXct and will work with either factors or characters.

Solution 2:

You may be overcomplicating things, is there any reason you need the stringr package? You can use as.Date and its format argument to specify the input format of your string.

 df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
 as.Date(df$Date, format =  "%m/%d/%Y %H:%M:%S")
 # [1] "2009-10-09" "2009-10-15"

Note the Details section of ?as.Date:

Character strings are processed as far as necessary for the format specified: any trailing characters are ignored

Thus, this also works:

as.Date(df$Date, format =  "%m/%d/%Y")
# [1] "2009-10-09" "2009-10-15"

All the conversion specifications that can be used to specify the input format are found in the Details section in ?strptime. Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string.


More generally and if you need the time component as well, use as.POSIXct or strptime:

as.POSIXct(df$Date, "%m/%d/%Y %H:%M:%S")    
strptime(df$Date, "%m/%d/%Y %H:%M:%S")

I'm guessing at what your actual data might look at from the partial results you give.