Converting factors to binary in R
Solution 1:
In base R, you could use sapply()
on the levels, using ==
to check for presence and as.integer()
to coerce it to binary.
cbind(df[1:2], sapply(levels(df$c), function(x) as.integer(x == df$c)), df[4])
# a b Pink Red Rose d
# 1 1 1 0 0 1 2
# 2 2 1 1 0 0 3
# 3 3 2 0 1 0 4
But since you have a million rows, you may want to go with data.table.
library(data.table)
setDT(df)[, c(levels(df$c), "c") :=
c(lapply(levels(c), function(x) as.integer(x == c)), .(NULL))]
which gives
df
# a b d Pink Red Rose
# 1: 1 1 2 0 0 1
# 2: 2 1 3 1 0 0
# 3: 3 2 4 0 1 0
And you can reset the column order if you need to with setcolorder(df, c(1, 2, 4:6, 3))
.
Solution 2:
You can do this with reshaping:
library(dplyr)
library(tidyr)
df %>%
mutate(value = 1,
c = paste0("Is", c)) %>%
spread(c, value, fill = 0)