Converting a character string into a date in R
Solution 1:
Updated: Improved with @Richard Scriven's colClasses
and simpler as.Date()
suggestions
Here are two similar methods that worked for me, going from a csv containing mmddyyyy
format date, to getting it recognized by R as a date object.
Starting first with a simple file tv.csv:
Series,FirstAir
Quantico,09272015
Muppets,09222015
Method 1: All as string
Once within R,
> t = read.csv('tv.csv', colClasses = 'character')
- imports
tv.csv
as a data frame namedt
-
colClasses = 'character')
option causes all the data to be considered thecharacter
data type (instead of beingFactor
,int
types)
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : chr "Quantico" "Muppets"
$ FirstAir: chr "09272015" "09222015"
- R has imported all as strings of characters, indicated here as type
chr
The chr
or string of characters are then easily converted into a date:
> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
-
as.Date()
performs string to date conversion -
%m%d%Y
specifies how to interpret the input int$FirstAir
. These format codes, at least on Linux, can be found with running$ man date
which brings up the manual on thedate
program, where there is a list of formatting codes. For example it says%m month (01..12)
Method 2: Import then fix only the date
If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.
Once within R,
> t = read.csv('tv.csv')
- imports
tv.csv
as a data frame namedt
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : Factor w/ 2 levels "Muppets","Quantico": 2 1
$ FirstAir: int 9272015 9222015
>
- R tries its best to guess the variable type per variable
- As you can see an immediate problem is, for
FirstAir
variable R has imported09272015
asint
meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.
This can be done in a single command but for clarity I have broken this into two steps. First,
> t$FirstAir = sprintf("%08d", t$FirstAir)
-
sprintf
is a formatting function -
0
means pad with zeroes -
8
means ensure 8 characters, because mmddyyyy is total 8 characters -
d
is used when the input is a number, which currently it is, recallstr()
output claimed thet$FirstAir
is anint
meaning integer -
t$FirstAir
is the variable we are both setting and using as input
Check the result:
> str(t$FirstAir)
chr [1:2] "09272015" "09222015"
- it successfully converted from an
int
to achr
type, for example9272015
became"09272015"
Now it is a string or chr
type we can then convert, same as method 1.
> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
Result
We do a final check:
> str(t$FirstAir)
Date[1:2], format: "2015-09-27" "2015-09-22"
In both cases, what were original values in a text file are have now been successfully converted into R date objects.
Solution 2:
Have a look at lubridate
mdy
function
require(lubridate)
a <- "10281994"
mdy(a)
gives you
[1] "1994-10-28 UTC"
of class "POSIXct" "POSIXt"
so a datetime in R. (thanks Joshua Ulrich for the correction)
You could use as.Date(mdy(a))
= 1994-10-28
to get a Object of class Date
.
There are mutations like ymd
and dmy
within lubridate
as well.