Working with date data that has badly formatted dates- R
So I have the following data frame that has 1 column of dates.
date <- structure(list(Date = c("09/09/202109/09", "09/12/202109/12",
"10/12/202110/12", "11/12/202111/12", "01/12/202201/12", "08/12/202108/12"
)), row.names = c(NA, 6L), class = "data.frame")
> print(date)
Date
1 09/09/202109/09
2 09/12/202109/12
3 10/12/202110/12
4 11/12/202111/12
5 01/12/202201/12
6 08/12/202108/12
For row 1 (09/09/202109/09) - the date is 09/09/2021. My original plan was to just use a case_when/mutate and change every date, but I wanted to see if there was a faster way.
Is it possible to strip the last 4 characters of each row in the column?
My desired output would be this
> print(date)
Date
1 09/09/2021
2 09/12/2021
3 10/12/2021
4 11/12/2021
5 01/12/2022
6 08/12/2021
>
Solution 1:
I have assumed your dates are month/day/year. However if they are day/month/year just change "%m/%d/%Y"
to "%d/%m/%Y"
.
date <- structure(list(Date = c("09/09/202109/09", "09/12/202109/12",
"10/12/202110/12", "11/12/202111/12", "01/12/202201/12", "08/12/202108/12"
)), row.names = c(NA, 6L), class = "data.frame")
clean_date_strings <- substr(date$Date, 1, nchar(date$Date)-4)
as.Date(clean_date_strings, format = "%m/%d/%Y")
Output:
r$> as.Date(clean_date_strings, format = "%m/%d/%Y")
[1] "2021-09-09" "2021-09-12" "2021-10-12" "2021-11-12" "2022-01-12" "2021-08-12"