Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have:

df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))

I want:

df.desired  <- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4))

Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.

Solution 1:

Use the model.matrix function:

model.matrix( ~ Species - 1, data=iris )

Solution 2:

If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the acm.disjonctif function from the ade4 package :

R> library(ade4)
R> df <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red"))
R> acm.disjonctif(df)
  eggs.bar eggs.foo ham.blue ham.green ham.red
1        0        1        0         0       1
2        0        1        1         0       0
3        1        0        0         1       0
4        1        0        0         0       1

Not exactly the case you are describing, but it can be useful too...

Solution 3:

A quick way using the reshape2 package:

require(reshape2)

> dcast(df.original, ham ~ eggs, length)

Using ham as value column: use value_var to override.
  ham bar foo
1   1   0   1
2   2   0   1
3   3   1   0
4   4   1   0

Note that this produces precisely the column names you want.

Solution 4:

probably dummy variable is similar to what you want. Then, model.matrix is useful:

> with(df.original, data.frame(model.matrix(~eggs+0), ham))
  eggsbar eggsfoo ham
1       0       1   1
2       0       1   2
3       1       0   3
4       1       0   4

Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Related

Recent Posts