Working with date data that has badly formatted dates- R

So I have the following data frame that has 1 column of dates.

date <- structure(list(Date = c("09/09/202109/09", "09/12/202109/12", 
"10/12/202110/12", "11/12/202111/12", "01/12/202201/12", "08/12/202108/12"
)), row.names = c(NA, 6L), class = "data.frame")

> print(date)
             Date
1 09/09/202109/09
2 09/12/202109/12
3 10/12/202110/12
4 11/12/202111/12
5 01/12/202201/12
6 08/12/202108/12

For row 1 (09/09/202109/09) - the date is 09/09/2021. My original plan was to just use a case_when/mutate and change every date, but I wanted to see if there was a faster way.

Is it possible to strip the last 4 characters of each row in the column?

My desired output would be this


> print(date)
             Date
1 09/09/2021
2 09/12/2021
3 10/12/2021
4 11/12/2021
5 01/12/2022
6 08/12/2021
>


Solution 1:

I have assumed your dates are month/day/year. However if they are day/month/year just change "%m/%d/%Y" to "%d/%m/%Y".

date <- structure(list(Date = c("09/09/202109/09", "09/12/202109/12", 
"10/12/202110/12", "11/12/202111/12", "01/12/202201/12", "08/12/202108/12"
)), row.names = c(NA, 6L), class = "data.frame")

clean_date_strings  <- substr(date$Date, 1, nchar(date$Date)-4)
as.Date(clean_date_strings, format = "%m/%d/%Y")

Output:

r$> as.Date(clean_date_strings, format = "%m/%d/%Y")
[1] "2021-09-09" "2021-09-12" "2021-10-12" "2021-11-12" "2022-01-12" "2021-08-12"