How to Loop/Repeat a Linear Regression in R
You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.
set.seed(1)
# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5
# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender <- factor(rbinom(obs, 1, .5))
# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))
# extract just coefficients
sapply(my_lms, coef)
# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared,
adj_r_sq = x$adj.r.squared))
The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]]
and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.
The question seems to be about how to call regression functions with formulas which are modified inside a loop.
Here is how you can do it in (using diamonds dataset):
attach(ggplot2::diamonds)
strCols = names(ggplot2::diamonds)
formula <- list(); model <- list()
for (i in 1:1) {
formula[[i]] = paste0(strCols[7], " ~ ", strCols[7+i])
model[[i]] = glm(formula[[i]])
#then you can plot or do anything else with the result ...
png(filename = sprintf("diamonds_price=glm(%s).png", strCols[7+i]))
par(mfrow = c(2, 2))
plot(model[[i]])
dev.off()
}
Sensible or not, to make the loop at least somehow work you need:
y<- c(1,5,6,2,5,10) # response
x1<- c(2,12,8,1,16,17) # predictor
x2<- c(2,14,5,1,17,17)
predictorlist<- list("x1","x2")
for (i in predictorlist){
model <- lm(paste("y ~", i[[1]]), data=df)
print(summary(model))
}
The paste function will solve the problem.