Convert dd/mm/yy and dd/mm/yyyy to Dates
I have some a character vector with dates in various formats like this
dates <- c("23/11/12", "20/10/2012", "22/10/2012" ,"23/11/12")
I want to convert these to Dates. I have tried the very good dmy from the lubridate package, but this does not work:
dmy(dates)
[1] "0012-11-23 UTC" "2012-10-20 UTC" "2012-10-22 UTC" "0012-11-23 UTC"
It is treating the /12 year as if it is 0012.
So I now am trying regular expression to select each type and individually convert to dates using as.Date(). However the regular expression I have tried to select the dd/mm/yy only does not work.
dates[grep('[0-9]{2}/[0-9]{2}/[0-9]{2,2}', dates)]
returns
[1] "23/11/12" "20/10/2012" "22/10/2012" "23/11/12"
I thought that the {2,2} should get a exactly 2 numbers and not all of them. I'm not very good at regular expression so any help will be appreciated.
Thanks
EDIT
What I actually have are three different types of date as below
dates <- c("23-Jul-2013", "23/11/12", "20/10/2012", "22/10/2012" ,"23/11/12")
And I want to convert these to dates
parse_date_time(dates,c('dmy'))
gives me
[1] "2013-07-23" "0012-11-23" "2012-10-20" "2012-10-22" "0012-11-23"
However, this is wrong and 0012 should be 2012. I would like (a fairly simple) solution to this.
One solution I now have (thanks to @plannapus)is to use regular expressions I actually ended up creating this function as I was still getting some cases where the lubridate approach was turning 12 into 0012
asDateRegex <- function(dates,
#selects strings from the vector dates using regexes and converts these to Dates
regexes = c('[0-9]{2}/[0-9]{2}/[0-9]{4}', #dd/mm/yyyy
'[0-9]{2}/[0-9]{2}/[0-9]{2}$', #dd/mm/yy
'[0-9]{2}-[[:alpha:]]{3}-[0-9]{4}'), #dd-mon-yyyy
orders = 'dmy',
...){
require(lubridate)
new_dates <- as.Date(rep(NA, length(dates)))
for(reg in regexes){
new_dates[grep(reg, dates)] <- as.Date(parse_date_time(dates[grep(reg, dates)], order = orders))
}
new_dates
}
asDateRegex (dates)
[1] "2012-10-20" "2013-07-23" "2012-11-23" "2012-10-22" "2012-11-23"
But this is not very elegant. Any better solutions?
Solution 1:
You can use parse_date_time
from lubridate
:
some.dates <- c("23/11/12", "20/10/2012", "22/10/2012" ,"23/11/12")
parse_date_time(some.dates,c('dmy'))
[1] "2012-11-23 UTC" "2012-10-20 UTC" "2012-10-22 UTC" "2012-11-23 UTC"
But , Note that the order of format is important :
some.dates <- c("20/10/2012","23/11/12", "22/10/2012" ,"23/11/12")
parse_date_time(some.dates,c('dmY','dmy'))
[1] "2012-10-20 UTC" "2012-11-23 UTC" "2012-10-22 UTC" "2012-11-23 UTC"
EDIT
Internally parse_date_time
is using guess_formats
(which I guess uses some regular expressions):
guess_formats(some.dates,c('dmy'))
dmy dmy dmy dmy
"%d/%m/%Y" "%d/%m/%y" "%d/%m/%Y" "%d/%m/%y"
As mentioned in the comment you can use parse_date_time
like this:
as.Date(dates, format = guess_formats(dates,c('dmy')))
Solution 2:
You can choose the format based upon input length of date.
y <- ifelse(nchar(dates) == 8, "y", "Y")
as.Date(dates, format = paste0("%d/%m/%", y))
Solution 3:
You can use strsplit
and nchar
to get a subvector of dates where the year is two characters long:
> dates[sapply(strsplit(dates,"/"),function(x)nchar(x)[3]==2)]
[1] "23/11/12" "23/11/12"
Solution 4:
Following your original attempt at regex
based solutions, you may try gsub
using this regexp
, then converting to any date-time format you wish...
# Replace 4 digit years with two digit years
short <- gsub( "([0-9]{2})([0-9]{2})$" , "\\2" , dates )
#[1] "23/11/12" "20/10/12" "22/10/12" "23/11/12"
as.Date( short , format = "%d/%m/%y" )
#[1] "2012-11-23" "2012-10-20" "2012-10-22" "2012-11-23"
Solution 5:
If you really wanted to do it in regexp you should have used $
to signify that there was nothing (i.e. end of string) after the last two-digits numbers:
dates[grep('[0-9]{2}/[0-9]{2}/[0-9]{2}$', dates)]
[1] "23/11/12" "23/11/12"
Otherwise, in addition to the other answers you can have a look here and here for other ways of handling multiple date formats.