Specifying formula in R with glm without explicit declaration of each covariate

Solution 1:

Your use of . creatively to build the formula containing all or almost all variables is a good and clean approach. Another option that is useful sometimes is to build the formula programatically as a string, and then convert it to formula using as.formula:

vars <- paste("Var",1:10,sep="")
fla <- paste("y ~", paste(vars, collapse="+"))
as.formula(fla)

Of course, you can make the fla object way more complicated.

Solution 2:

Aniko answered your question. To extend a bit :

You can also exclude variables using - :

glm(Y~.-W1+A*I(W2^2), family=binomial, data=samp)

For large groups of variables, I often make a frame for grouping the variables, which allows you to do something like :

vars <- data.frame(
    names = names(samp),
    main = c(T,F,T,F),
    quadratic =c(F,T,T,F),
    main2=c(T,T,F,F),
    stringsAsFactors=F
)


regform <- paste(
    "Y ~",
    paste(
      paste(vars[vars$main,1],collapse="+"),
      paste(vars[1,1],paste("*I(",vars[vars$quadratic,1],"^2)"),collapse="+"),
      sep="+"
    )
)
> regform
[1] "Y ~ W1+A+W1 *I( W2 ^2)+W1 *I( A ^2)"

> glm(as.formula(regform),data=samp,family=binomial)

Using all kind of conditions (on name, on structure, whatever) to fill the dataframe, allows me to quickly select groups of variables in large datasets.