Replace contents of factor column in R dataframe
I need to replace the levels of a factor column in a dataframe. Using the iris
dataset as an example, how would I replace any cells which contain virginica
with setosa
in the Species column?
I expected the following to work, but it generates a warning message and simply inserts NAs:
iris$Species[iris$Species == 'virginica'] <- 'setosa'
Solution 1:
I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:
levels(iris$Species)
# [1] "setosa" "versicolor" "virginica"
Your example was bad, this works:
iris$Species[iris$Species == 'virginica'] <- 'setosa'
This is what more likely creates the problem you were seeing with your own data:
iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L, :
# invalid factor level, NAs generated
It will work if you first increase your factor levels:
levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'
If you want to replace "species A" with "species B" you'd be better off with
levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"
Solution 2:
For the things that you are suggesting you can just change the levels using the levels
:
levels(iris$Species)[3] <- 'new'
Solution 3:
You can use the function revalue
from the package plyr
to replace values in a factor vector.
In your example to replace the factor virginica
by setosa
:
data(iris)
library(plyr)
revalue(iris$Species, c("virginica" = "setosa")) -> iris$Species
Solution 4:
I had the same problem. This worked better:
Identify which level you want to modify: levels(iris$Species)
"setosa" "versicolor" "virginica"
So, setosa
is the first.
Then, write this:
levels(iris$Species)[1] <-"new name"
Solution 5:
A more general solution that works with all the data frame at once and where you don't have to add new factors levels is:
data.mtx <- as.matrix(data.df)
data.mtx[which(data.mtx == "old.value.to.replace")] <- "new.value"
data.df <- as.data.frame(data.mtx)
A nice feature of this code is that you can assign as many values as you have in your original data frame at once, not only one "new.value"
, and the new values can be random values. Thus you can create a complete new random data frame with the same size as the original.