Convert type of multiple columns of a dataframe at once
Solution 1:
Edit See this related question for some simplifications and extensions on this basic idea.
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
Solution 2:
If you want to automatically detect the columns data-type rather than manually specify it (e.g. after data-tidying, etc.), the function type.convert()
may help.
The function type.convert()
takes in a character vector and attempts to determine the optimal type for all elements (meaning that it has to be applied once per column).
df[] <- lapply(df, function(x) type.convert(as.character(x)))
Since I love dplyr
, I prefer:
library(dplyr)
df <- df %>% mutate_all(funs(type.convert(as.character(.))))
Solution 3:
I find I run into this a lot as well. This is about how you import data. All of the read...() functions have some type of option to specify not converting character strings to a factor. Meaning that text strings will stay character and things that look like numbers will stay as numbers. A problem arises when you have elements that are empty and not NA. But again, na.strings = c("",...) should solve that as well. I'd start by taking a hard look at your import process and adjusting it accordingly.
But you could always create a function and push this string through.
convert.magic <- function(x, y=NA) {
for(i in 1:length(y)) {
if (y[i] == "numeric") {
x[i] <- as.numeric(x[[i]])
}
if (y[i] == "character")
x[i] <- as.character(x[[i]])
}
return(x)
}
foo <- convert.magic(foo, c("character", "character", "numeric"))
> str(foo)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
Solution 4:
I know I am quite late to answer, but using a loop along with the attributes function is a simple solution to your problem.
names <- c("x", "y", "z")
chclass <- c("character", "character", "numeric")
for (i in (1:length(names))) {
attributes(foo[, names[i]])$class <- chclass[i]
}
Solution 5:
I just ran into something like this with RSQLite fetch method... the results come back as atomic data types. In my case, it was a date time stamp that was causing me frustration.
I found that the setAs
function is very useful for helping make as
work as expected. Here is my small example case.
##data.frame conversion function
convert.magic2 <- function(df,classes){
out <- lapply(1:length(classes),
FUN = function(classIndex){as(df[,classIndex],classes[classIndex])})
names(out) <- colnames(df)
return(data.frame(out))
}
##small example case
tmp.df <- data.frame('dt'=c("2013-09-02 09:35:06", "2013-09-02 09:38:24", "2013-09-02 09:38:42", "2013-09-02 09:38:42"),
'v'=c('1','2','3','4'),
stringsAsFactors=FALSE)
classes=c('POSIXct','numeric')
str(tmp.df)
#confirm that it has character datatype columns
## 'data.frame': 4 obs. of 2 variables:
## $ dt: chr "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
## $ v : chr "1" "2" "3" "4"
##is the dt column coerceable to POSIXct?
canCoerce(tmp.df$dt,"POSIXct")
## [1] FALSE
##and the conver.magic2 function fails also:
tmp.df.n <- convert.magic2(tmp.df,classes)
## Error in as(df[, classIndex], classes[classIndex]) :
## no method or default for coercing “character” to “POSIXct”
##ittle reading reveals the setAS function
setAs('character', 'POSIXct', function(from){return(as.POSIXct(from))})
##better answer for canCoerce
canCoerce(tmp.df$dt,"POSIXct")
## [1] TRUE
##better answer from conver.magic2
tmp.df.n <- convert.magic2(tmp.df,classes)
##column datatypes converted as I would like them!
str(tmp.df.n)
## 'data.frame': 4 obs. of 2 variables:
## $ dt: POSIXct, format: "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
## $ v : num 1 2 3 4