Linear regression: degrees of freedom of SST, SSR, and RSS
I'm trying to understand the concept of degrees of freedom in the specific case of the three quantities involved in a linear regression solution,
i.e. $SST=SSR+SSE, $
i.e. Total sum of squares = sum of squares due to regression + sum of squared errors,
i.e. $\sum(y_i-\bar y)^2=\sum(\hat y_i-\bar y)^2+\sum(y_i-\hat y_i)^2$.
I tried Wikipedia and thought I had understood why the first (SST) and the third (SSE) have (n-1) and (n-2) degrees of freedom respectively, but I could not make out why (SSR) has 1 degree of freedom. So maybe I did not understand degrees of freedom after all. Can someone explain?
Thank you!
Sources: http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29 http://www.cs.rice.edu/~johnmc/comp528/lecture-notes/Lecture9.pdf
Solution 1:
There are many different ways to look at degrees of freedom. I wanted to provide a rigorous answer that starts from a concrete definition of degrees of freedom for a statistical estimator as this may be useful/satisfying to some readers:
Definition: Given an observational model of the form $$y_i=r(x_i)+\xi_i,\ \ \ i=1,\dots,n$$ where $\xi_i=\mathcal{N}(0,\sigma^2)$ are i.i.d. noise terms and the $x_i$ are fixed. The degrees of freedom (DOF) of the estimator $\hat{y}$ is defined as $$\text{df}(\hat{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\hat{y}_i,y_i)=\frac{1}{\sigma^2}\text{Tr}(\text{Cov}(\hat{y},y)),$$ or equivalently by Stein's lemma $$\text{df}(\hat{y})=\mathbb{E}(\text{div} \hat{y}).$$
Using this definition, let's analyze linear regression.
Linear Regression: Consider the model $$y_i=x_i\beta +\xi_i,$$ with $x_i\in\mathbb{R}^p$ are independent row vectors. In your case, $p=2$, and the $x_i={z_i,1}$ correspond to a point and the constant $1$, and $\beta=\left[\begin{array}{c} m\\ b \end{array}\right]$, that is a slope and constant term so that $x_i \beta=m z_i+b$. Then this can be rewritten as $$y=X\beta+\xi$$ where $X$ is an $n\times p$ matrix whose $i^{th}$ row is $x_i$. The least squares estimator is $\hat{\beta}^{LS}=(X^T X)^{-1}X^Ty$. Let's now based on the above definition calculate the degrees of freedom of $SST$, $SSR$, and $SSE$.
$SST:$ For this, we need to calculate $$\text{df}(y_i-\overline{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(y_i-\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n \frac{\sigma^2}{n}=n-1.$$
$SSR:$ For this, we need to calculate $$\text{df}(X\hat{\beta}^{LS}-\overline{y})=\frac{1}{\sigma^2}\text{Tr}\left(\text{Cov}(X(X^TX)^{-1}X^y,y\right)-\text{df}(\overline{y})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X\text{Cov(y,y)})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X^T)$$ $$=p-1.$$ In your case $p=2$ since you will want $X$ to include the all ones vector so that there is an intercept term, and so the degrees of freedom will be $1$. However note that this will equal the number of parameters when we are doing regression with multiple parameters.
$SSE:$ $(n-1)-(p-1)=n-p$, which follows linearity of $df$.
Solution 2:
Since $\hat{y_i}$ is determined from the linear regression, it has two degrees of freedom, corresponding to the fact that we specify a line by two points. When we consider the equation of a line in slope-intercept form, this becomes the slope value and the y-intercept value. When we subtract the mean response, $\overline{y}$, it cancels the y-intercept value (a property of the construction of the regression), and so the only degree of freedom we are left with is the one due to the slope. Thus the number of degrees of freedom is $1$.