Correct R^2 values in linear regression models with full set of factors and no constant in R

r linear-regression lm

I have a dataset from an experiment with four treatments, which are coded using four dummy variables which I will call Tr1, Tr2, Tr3 and Tr4. (So to be clear: Tr1 + Tr2 + Tr3 + Tr4 = 1 for all observations in my dataset.) Now I am estimating a linear regression model where I regress my outcome variable y on all four dummies plus some other regressors, omitting the constant:

lm(y ~ Tr1 + Tr2 + Tr3 + Tr4 + var1 + var2 + 0, data = df)

The problem is that this model without a constant yields inflated R^2 values. Apparently, Stata has a , hasconst option that yields correct R^2 values while accounting for the fact the model is estimated without constant, but that a constant is included in the form of all dummies. Is there something similar in R? Or can somebody please tell me how to calculate R^2 for this model?

Solution 1:

If you drop one of the dummies but include a constant, the model estimated will be the same (modulo adding the constant to your dummies) and the R2 will be correct:

m <- lm(y ~ Tr2 + Tr3 + Tr4 + var1 + var2, data = df)
rsq <- summary(m)$r.squared

Correct R^2 values in linear regression models with full set of factors and no constant in R

Solution 1:

Related

Recent Posts