R Loop for Variable Names to run linear regression model
Ok, I'll post an answer. I will use the dataset mtcars
as an example. I believe it will work with your dataset.
First, I create a store, lm.test
, an object of class list
. In your code you are assigning the output of lm(.)
every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate
to put together the regression formula. There are other ways of doing this but this one is simple.
# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]
lm.test <- vector("list", length(col10))
for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}
lm.test
Now you can use the results list for all sorts of things. I suggest you start using lapply
and friends for that.
For instance, to extract the coefficients:
cfs <- lapply(lm.test, coef)
In order to get the summaries:
smry <- lapply(lm.test, summary)
It becomes very simple once you're familiar with *apply
functions.
You can create a temporary subset in which you select only the columns used in your regression. This way, you won't need to inject the temporary name in the formula.
Sticking up to your code, this should do the trick.
for(i in 1:length(col10)){
tempSubset <- data[,c("Total_Transactions", col10[i]]
lm.test <- lm(Total_Transactions ~ ., data = tempSubset)
i + 1
}