Convert type of multiple columns of a dataframe at once

Solution 1:

Edit See this related question for some simplifications and extensions on this basic idea.

My comment to Brandon's answer using switch:

convert.magic <- function(obj,types){
    for (i in 1:length(obj)){
        FUN <- switch(types[i],character = as.character, 
                                   numeric = as.numeric, 
                                   factor = as.factor)
        obj[,i] <- FUN(obj[,i])
    }
    obj
}

out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame':   10 obs. of  3 variables:
 $ x: chr  "1" "2" "3" "4" ...
 $ y: chr  "red" "red" "red" "blue" ...
 $ z: num  15254 15255 15256 15257 15258 ...

For truly large data frames you may want to use lapply instead of the for loop:

convert.magic1 <- function(obj,types){
    out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
    names(out) <- colnames(obj)
    as.data.frame(out,stringsAsFactors = FALSE)
}

When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...)). Also, be aware of data.frame() and as.data.frame()s default behavior of converting character to factor.

Solution 2:

If you want to automatically detect the columns data-type rather than manually specify it (e.g. after data-tidying, etc.), the function type.convert() may help.

The function type.convert() takes in a character vector and attempts to determine the optimal type for all elements (meaning that it has to be applied once per column).

df[] <- lapply(df, function(x) type.convert(as.character(x)))

Since I love dplyr, I prefer:

library(dplyr)
df <- df %>% mutate_all(funs(type.convert(as.character(.))))

Solution 3:

I find I run into this a lot as well. This is about how you import data. All of the read...() functions have some type of option to specify not converting character strings to a factor. Meaning that text strings will stay character and things that look like numbers will stay as numbers. A problem arises when you have elements that are empty and not NA. But again, na.strings = c("",...) should solve that as well. I'd start by taking a hard look at your import process and adjusting it accordingly.

But you could always create a function and push this string through.

convert.magic <- function(x, y=NA) {
for(i in 1:length(y)) { 
if (y[i] == "numeric") { 
x[i] <- as.numeric(x[[i]])
}
if (y[i] == "character")
x[i] <- as.character(x[[i]])
}
return(x)
}

foo <- convert.magic(foo, c("character", "character", "numeric"))

> str(foo)
'data.frame':   10 obs. of  3 variables:
 $ x: chr  "1" "2" "3" "4" ...
 $ y: chr  "red" "red" "red" "blue" ...
 $ z: num  15254 15255 15256 15257 15258 ...

Solution 4:

I know I am quite late to answer, but using a loop along with the attributes function is a simple solution to your problem.

names <- c("x", "y", "z")
chclass <- c("character", "character", "numeric")

for (i in (1:length(names))) {
  attributes(foo[, names[i]])$class <- chclass[i]
}

Solution 5:

I just ran into something like this with RSQLite fetch method... the results come back as atomic data types. In my case, it was a date time stamp that was causing me frustration. I found that the setAs function is very useful for helping make as work as expected. Here is my small example case.

##data.frame conversion function
convert.magic2 <- function(df,classes){
  out <- lapply(1:length(classes),
                FUN = function(classIndex){as(df[,classIndex],classes[classIndex])})
  names(out) <- colnames(df)
  return(data.frame(out))
}

##small example case
tmp.df <- data.frame('dt'=c("2013-09-02 09:35:06", "2013-09-02 09:38:24", "2013-09-02 09:38:42", "2013-09-02 09:38:42"),
                     'v'=c('1','2','3','4'),
                     stringsAsFactors=FALSE)
classes=c('POSIXct','numeric')
str(tmp.df)
#confirm that it has character datatype columns
##  'data.frame':  4 obs. of  2 variables:
##    $ dt: chr  "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
##    $ v : chr  "1" "2" "3" "4"

##is the dt column coerceable to POSIXct?
canCoerce(tmp.df$dt,"POSIXct")
##  [1] FALSE

##and the conver.magic2 function fails also:
tmp.df.n <- convert.magic2(tmp.df,classes)

##  Error in as(df[, classIndex], classes[classIndex]) : 
##    no method or default for coercing “character” to “POSIXct” 

##ittle reading reveals the setAS function
setAs('character', 'POSIXct', function(from){return(as.POSIXct(from))})

##better answer for canCoerce
canCoerce(tmp.df$dt,"POSIXct")
##  [1] TRUE

##better answer from conver.magic2
tmp.df.n <- convert.magic2(tmp.df,classes)

##column datatypes converted as I would like them!
str(tmp.df.n)

##  'data.frame':  4 obs. of  2 variables:
##    $ dt: POSIXct, format: "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
##   $ v : num  1 2 3 4