Formula with dynamic number of variables
See ?as.formula
, e.g.:
factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2
where factors
is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm
model, e.g.:
set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))
# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))
# Coefficients:
# (Intercept) factor1 factor2
# 0.542471 -0.002525 -0.147433
An oft forgotten function is reformulate
. From ?reformulate
:
reformulate
creates a formula from a character vector.
A simple example:
listoffactors <- c("factor1","factor2")
reformulate(termlabels = listoffactors, response = 'y')
will yield this formula:
y ~ factor1 + factor2
Although not explicitly documented, you can also add interaction terms:
listofintfactors <- c("(factor3","factor4)^2")
reformulate(termlabels = c(listoffactors, listofintfactors),
response = 'y')
will yield:
y ~ factor1 + factor2 + (factor3 + factor4)^2
Another option could be to use a matrix in the formula:
Y = rnorm(10)
foo = matrix(rnorm(100),10,10)
factors=c(1,5,8)
lm(Y ~ foo[,factors])
You don't actually need a formula. This works:
lm(data_frame[c("Y", "factor1", "factor2")])
as does this:
v <- c("Y", "factor1", "factor2")
do.call("lm", list(bquote(data_frame[.(v)])))
I generally solve this by changing the name of my response column. It is easier to do dynamically, and possibly cleaner.
model_response <- "response_field_name"
setnames(model_data_train, c(model_response), "response") #if using data.table
model_gbm <- gbm(response ~ ., data=model_data_train, ...)