Specifying colClasses in the read.csv

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column time is basically a character vector, while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" , 
                 colClasses=c(time="character", "numeric"), 
                 strip.white=FALSE)

In the above command, I want R to read in the time column as "character" and the rest as numeric. Although the data variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I can fix these warnings?

Warning messages:
 1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
    not all columns named in 'colClasses' exist
 2: In tmp[i[i > 0L]] <- colClasses :
    number of items to replace is not a multiple of replacement length

Derek


Solution 1:

You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))

Solution 2:

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))

Solution 3:

Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)

Solution 4:

If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)