Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have:
df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))
I want:
df.desired <- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4))
Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.
Solution 1:
Use the model.matrix
function:
model.matrix( ~ Species - 1, data=iris )
Solution 2:
If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the acm.disjonctif
function from the ade4
package :
R> library(ade4)
R> df <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red"))
R> acm.disjonctif(df)
eggs.bar eggs.foo ham.blue ham.green ham.red
1 0 1 0 0 1
2 0 1 1 0 0
3 1 0 0 1 0
4 1 0 0 0 1
Not exactly the case you are describing, but it can be useful too...
Solution 3:
A quick way using the reshape2
package:
require(reshape2)
> dcast(df.original, ham ~ eggs, length)
Using ham as value column: use value_var to override.
ham bar foo
1 1 0 1
2 2 0 1
3 3 1 0
4 4 1 0
Note that this produces precisely the column names you want.
Solution 4:
probably dummy variable is similar to what you want. Then, model.matrix is useful:
> with(df.original, data.frame(model.matrix(~eggs+0), ham))
eggsbar eggsfoo ham
1 0 1 1
2 0 1 2
3 1 0 3
4 1 0 4