Specifying formula in R with glm without explicit declaration of each covariate
Solution 1:
Your use of .
creatively to build the formula containing all or almost all variables is a good and clean approach. Another option that is useful sometimes is to build the formula programatically as a string, and then convert it to formula using as.formula
:
vars <- paste("Var",1:10,sep="")
fla <- paste("y ~", paste(vars, collapse="+"))
as.formula(fla)
Of course, you can make the fla
object way more complicated.
Solution 2:
Aniko answered your question. To extend a bit :
You can also exclude variables using - :
glm(Y~.-W1+A*I(W2^2), family=binomial, data=samp)
For large groups of variables, I often make a frame for grouping the variables, which allows you to do something like :
vars <- data.frame(
names = names(samp),
main = c(T,F,T,F),
quadratic =c(F,T,T,F),
main2=c(T,T,F,F),
stringsAsFactors=F
)
regform <- paste(
"Y ~",
paste(
paste(vars[vars$main,1],collapse="+"),
paste(vars[1,1],paste("*I(",vars[vars$quadratic,1],"^2)"),collapse="+"),
sep="+"
)
)
> regform
[1] "Y ~ W1+A+W1 *I( W2 ^2)+W1 *I( A ^2)"
> glm(as.formula(regform),data=samp,family=binomial)
Using all kind of conditions (on name, on structure, whatever) to fill the dataframe, allows me to quickly select groups of variables in large datasets.