Naive Bayes prediction in R with reading characters as factors and without factors
I'm trying to use Naive bayes on Mushroom Data set. Data set is 8124*23
with first column as response variable {'edible','poisonous'}
. I've eliminated missing data. Final, data set is 5644*23
. Below is the code I've used.
mushroom.data <- read.csv("mushroom.data",header = FALSE, stringsAsFactors = FALSE)
#mushroom.data <- read.csv("mushroom.data",header = FALSE, stringsAsFactors = TRUE)
#Eliminating missing data
mushroom.data <- subset(mushroom.data,mushroom.data$V12 != '?')
# Factoring target class
mushroom.data$V1 <- as.factor(mushroom.data$V1)
# First 4000 records as Training set.
mushroom.train.class <- mushroom.data[1:4000,1]
mushroom.train.data <- mushroom.data[1:4000,-1]
# Building naive bayes classifier
nb.model <- naiveBayes(mushroom.train.data,mushroom.train.class,laplace = 1)
# Last 1644 are Test records
mushroom.test.data <- mushroom.data[4001:5644,-1]
mushroom.test.class <- mushroom.data[4001:5644,1]
# Predicition
nb.pred <- predict(nb.model,mushroom.test.data)
# checking proportions of the predictions
prop.table(table(nb.pred))
The model is predicting everything into edible class
with stringAsFactors = FALSE with accuracy 10-15% and with stringAsFactors = TRUE the accuracy is 91%. What is happening with factoring?
Edit 1: Changed the title. Original problem was solved.
Solution 1:
You can't model character with NaiveBayes. Check ?NaiveBayes and look out for arguments section.