Change the Blank Cells to "NA"
I'm assuming you are talking about row 5 column "sex." It could be the case that in the data2.csv file, the cell contains a space and hence is not considered empty by R.
Also, I noticed that in row 5 columns "axles" and "door", the original values read from data2.csv are string "NA". You probably want to treat those as na.strings as well. To do this,
dat2 <- read.csv("data2.csv", header=T, na.strings=c("","NA"))
EDIT:
I downloaded your data2.csv. Yes, there is a space in row 5 column "sex". So you want
na.strings=c(""," ","NA")
You can use gsub to replace multiple mutations of empty, like "" or a space, to be NA:
data= data.frame(cats=c('', ' ', 'meow'), dogs=c("woof", " ", NA))
apply(data, 2, function(x) gsub("^$|^ $", NA, x))
This should do the trick
dat <- dat %>% mutate_all(na_if,"")
A more eye-friendly solution using dplyr
would be
require(dplyr)
## fake blank cells
iris[1,1]=""
## define a helper function
empty_as_na <- function(x){
if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont work with factors
ifelse(as.character(x)!="", x, NA)
}
## transform all columns
iris %>% mutate_each(funs(empty_as_na))
To apply the correction to just a subset of columns you can specify columns of interest using dplyr's column matching syntax. Example:mutate_each(funs(empty_as_na), matches("Width"), Species)
In case you table contains dates you should consider using a more typesafe version of ifelse
I recently ran into similar issues, and this is what worked for me.
If the variable is numeric, then a simple df$Var[df$Var == ""] <- NA
should suffice. But if the variable is a factor, then you need to convert it to the character first, then replace ""
cells with the value you want, and convert it back to factor. So case in point, your Sex
variable, I assume it would be a factor and if you want to replace the empty cell, I would do the following:
df$Var <- as.character(df$Var)
df$Var[df$Var==""] <- NA
df$Var <- as.factor(df$Var)