Considering $y_i=\beta_1+\beta_2x_i+\epsilon_i$

$\bar y_i=\hat\beta_1+\hat\beta_2\bar x_i+\bar\epsilon_i$

a linear equation of least square used when it seems that there is a like between two data, $\epsilon_i$ is the noise, usually $\epsilon$~$N(0,\sigma)$

$\beta_i$ are constant we find by the least square regression method.

$\hat\beta_i$ are estimated values of those $\beta_i$

Having $$||y-\bar y||^2=||\hat y-\bar y||^2+ ||\epsilon||^2$$

$$\underbrace{\sum\limits_{i=1}^{n}(y_i-\bar y)^2}_{TSS}=\underbrace{\sum\limits_{i=1}^{n}(\hat y_i-\bar y)^2}_{ESS} +\underbrace{\sum\limits_{i=1}^{n}(\hat\epsilon_i^2)}_{RSS}$$

I have to show that $R^2=\frac{ESS}{TSS}=\frac{(\hat y-\bar y)^2}{(y_-\bar y)^2}=1-\frac{||\hat \epsilon||^2}{(\hat y-\bar y)^2}=\underbrace{1-\frac{RSS}{TSS}=\rho_{xy}^2}_{\text{The step I don't get}}$

with

$$\rho = \rho_{xy} =\frac{\sum ^n _{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}(y_i - \bar{y})^2}}$$

I don't even know where to start... Any hint appreciate.


It's just a calculation with using a lot of definitions from the linear regression. Let's start with the $R^2$ (I use $\sum_{i=1}^n = \sum$):

$$ R^2 = \frac{ESS}{TSS} = \frac{\sum (\hat{y}_i - \bar{y})^2}{\sum(y_i - \bar{y})^2} $$

With the sample variance

\begin{align*} &\hat{\sigma}^2_y = \frac{1}{n-1} \sum (y_i - \bar{y})^2 \\ \Leftrightarrow\ &\hat{\sigma}^2_y (n-1) = \sum (y_i - \bar{y})^2 \qquad\qquad (1) \end{align*}

you get

$$ R^2 = \frac{\sum (\hat{y}_i - \bar{y})^2}{\sum(y_i - \bar{y})^2} = \frac{\sum (\hat{y}_i - \bar{y})^2}{\hat{\sigma}^2_y (n-1)} \qquad\qquad (2) $$

Keep this equation in mind! Now we take a look at the estimator for $\beta_2$ in our linear regression. With some calculations you can see that the minimum quadrat estimator $\hat{\beta}_2$ is:

$$ \hat{\beta}_2 = \frac{\sum(x_i - \bar{x})y_i}{\sum(x_i - \bar{x})^2} $$

With $(1)$ (just for the sample x) we get:

$$ \hat{\beta}_2 = \frac{\sum(x_i - \bar{x})y_i}{\sum(x_i - \bar{x})^2} \overset{(1)}{=} \frac{\sum(x_i - \bar{x})y_i}{\hat{\sigma}^2_x(n-1)} \qquad\qquad (3) $$

Now we take a closer look at the pearson correlation coefficient $\rho_{xy}$:

\begin{align*} \rho_{xy} &= \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2}\sqrt{\sum(y_i - \bar{y})^2}} = \\ \\ &= \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\frac{n-1}{n-1}\sqrt{\sum(x_i - \bar{x})^2}\sqrt{\sum(y_i - \bar{y})^2}} = \\ \\ &= \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{(n-1)\sqrt{\frac{1}{n-1}\sum(x_i - \bar{x})^2}\sqrt{\frac{1}{n-1}\sum(y_i - \bar{y})^2}} = \\ \\ &= \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{(n-1)\hat{\sigma}_x\hat{\sigma}_y} \overset{on\ your\ own}{=} \frac{\sum(x_i - \bar{x})y_i}{(n-1)\hat{\sigma}_x\hat{\sigma}_y} \end{align*}

With this you get:

$$ \rho_{xy} (n-1)\hat{\sigma}_x\hat{\sigma}_y = \sum(x_i - \bar{x})y_i \qquad\qquad (4) $$

Now we can put $(4)$ in $(3)$ and get: $$ \hat{\beta}_2 = \frac{\rho_{xy} (n-1)\hat{\sigma}_x\hat{\sigma}_y}{\hat{\sigma}^2_x (n-1)} = \frac{\rho_{xy} \hat{\sigma}_y}{\hat{\sigma}_x} \qquad\qquad (5) $$

Now we use $\hat{y}_i = \hat{\beta}_1 + \hat{\beta}_2x_i\ \ (i)$ and $\hat{\beta}_1 = \bar{y}-\hat{\beta}_2\bar{x} \ \ (ii)$ and can finally compute by using equation $(2)$:

\begin{align*} R^2 &= \frac{\sum (\hat{y}_i - \bar{y})^2}{\hat{\sigma}^2_y (n-1)} \overset{(i)}{=} \frac{\sum (\hat{\beta}_1 + \hat{\beta}_2x_i - \bar{y})^2}{\hat{\sigma}^2_y (n-1)} \overset{(ii)}{=} \frac{\sum (\bar{y}-\hat{\beta}_2\bar{x} + \hat{\beta}_2x_i - \bar{y})^2}{\hat{\sigma}^2_y (n-1)} = \\ \\ &= \frac{\sum (\hat{\beta}_2 (x_i-\bar{x}))^2}{\hat{\sigma}^2_y (n-1)} = \frac{\hat{\beta}_2^2\overbrace{\sum (x_i-\bar{x})^2}^{=(n-1)\hat{\sigma}_x^2\ (1)}}{\hat{\sigma}^2_y (n-1)} = \hat{\beta}_2^2\frac{(n-1)\hat{\sigma}_x^2}{\hat{\sigma}^2_y (n-1)} = \\ \\ &\overset{(5)}{=}\left(\frac{\rho_{xy} \hat{\sigma}_y}{\hat{\sigma}_x}\right)^2\frac{\hat{\sigma}_x^2}{\hat{\sigma}_y^2} = \rho_{xy}^2 \left(\frac{\hat{\sigma}_y}{\hat{\sigma}_x}\right)^2\frac{\hat{\sigma}_x^2}{\hat{\sigma}_y^2} = \rho_{xy}^2 \frac{\hat{\sigma}_y^2}{\hat{\sigma}_x^2}\frac{\hat{\sigma}_x^2}{\hat{\sigma}_y^2} = \rho_{xy}^2 \end{align*}

Hope you get it with that. :)


Here is the proof my teacher gaved me today, shorter than the very good one from Daniel.

\begin{align*} R^2&=\frac{ESS}{TSS}=\frac{\sum(\hat y -\bar y)^2}{\sum (y_i-\bar y)^2}\\ &=\frac{\sum (\hat\beta_1+\hat \beta_2x_i-\bar y)^2}{\sum(y_i-\bar y)^2}\\ &=\frac{\sum (\bar y- \hat \beta_2\bar x+\hat \beta_2x_i-\bar y)^2}{\sum(y_i-\bar y)^2}\\ &=\frac{\hat\beta_2\sum(x_i-\bar x)^2}{\sum(y_i-\bar y)^2}\\ &=\frac{cov(x,y)^2*var(x)}{var(x)^2*var(y)}\\ &=\frac{cov(x,y)^2}{var(x)*var(y)}\\ &=\rho_{xy}^2 \end{align*}