Converting factors to binary in R

Solution 1:

In base R, you could use sapply() on the levels, using == to check for presence and as.integer() to coerce it to binary.

cbind(df[1:2], sapply(levels(df$c), function(x) as.integer(x == df$c)), df[4])
#   a b Pink Red Rose d
# 1 1 1    0   0    1 2
# 2 2 1    1   0    0 3
# 3 3 2    0   1    0 4

But since you have a million rows, you may want to go with data.table.

library(data.table)
setDT(df)[, c(levels(df$c), "c") := 
    c(lapply(levels(c), function(x) as.integer(x == c)), .(NULL))]

which gives

df
#    a b d Pink Red Rose
# 1: 1 1 2    0   0    1
# 2: 2 1 3    1   0    0
# 3: 3 2 4    0   1    0

And you can reset the column order if you need to with setcolorder(df, c(1, 2, 4:6, 3)).

Solution 2:

You can do this with reshaping:

library(dplyr)
library(tidyr)

df %>%
  mutate(value = 1,
         c = paste0("Is", c)) %>%
  spread(c, value, fill = 0)