Proof of $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$

It's a standard result that given $X_1,\cdots ,X_n $ random sample from $N(\mu,\sigma^2)$, the random variable $$\frac{(n-1)S^2}{\sigma^2}$$ has a chi-square distribution with $(n-1)$ degrees of freedom, where $$S^2=\frac{1}{n-1}\sum^{n}_{i=1}(X_i-\bar{X})^2.$$ I would like help in proving the above result.
Thanks.


A standard proof goes something like this. It assumes you already know the following.

  1. $\bar{X}$ (the sample mean) and $S^2$ are independent.
  2. If $Z \sim N(0,1)$ then $Z^2 \sim \chi^2(1)$.
  3. If $X_i \sim \chi^2(1)$ and the $X_i$ are independent then $\sum_{i=1}^n X_i \sim \chi^2(n)$.
  4. A $\chi^2(n)$ random variable has the moment generating function $(1-2t)^{-n/2}$.

With some algebra, you can show, by adding $-\bar{X} + \bar{X}$ inside the parentheses and grouping appropriately, that $\sum_{i=1}^n (X_i - \mu)^2 = \sum_{i=1}^n (X_i - \bar{X})^2 + n(\bar{X} - \mu)^2$. Then, dividing through by $\sigma^2$ yields $$ \sum_{i=1}^n \left(\frac{X_i - \mu}{\sigma}\right)^2 = \sum_{i=1}^n \left(\frac{X_i - \bar{X}}{\sigma}\right)^2 + \left(\frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right)^2.$$
Denote these expressions by $U, V$, and $W$, respectively, so that the formula reads $U = V+W$. By facts (2) and (3) above, $U \sim \chi^2(n)$ and $W \sim \chi^2(1)$. Also, $V = \frac{(n-1)S^2}{\sigma^2}$.

Since $\bar{X}$ and $S^2$ are independent, so are $V$ and $W$. Thus $M_U(t) = M_V(t) M_W(t)$, where $M_X(t)$ denotes the moment generating function of the random variable $X$. By fact (4) above, this says that $$\frac{1}{(1-2t)^{n/2}} = M_V(t) \frac{1}{(1-2t)^{1/2}}.$$ Thus $$M_V(t) = \frac{1}{(1-2t)^{(n-1)/2}},$$ and therefore $V \sim \chi^2(n-1)$.


I disagree with the characterization of the proof in Mike Spivey's answer as the standard proof. It's the proof for people who don't know about projections in linear algebra.

Notice that the mapping $(X_1,\dots,X_n) \mapsto (X_1-\overline{X},\dots,X_n - \overline{X})$ is a projection onto a space of dimension $n-1$. Notice also that its expected value is $0$. Then remember that the probability distribution of the vector $(X_1,\dots,X_n)$ is spherically symmetric. Therefore so is the distribution of its projection onto a space of dimension one less. Hence the square of the norm of that projection is just the square of the norm of a normal random vector with a spherically symmetric distribution centered at the origin. The square of the norm therefore has a chi-square distribution with degrees of freedom equal to the dimension of that space.


Here is another proof. This one doesn't depend on a moment generating functions or linear algebra (explicitly). The essence of the proof is to transform the $(X_i - \overline{X})/\sigma$ variables into $n-1$ independent $N(0,1)$ variables, $Z_i$, such that $\sum_{i=1}^{n}(X_i - \overline{X})^2/\sigma^2 = \sum_{i=1}^{n-1}Z_i^2$. I believe that this was the same approach used by Helmert to prove this result in 1876. (See https://en.wikipedia.org/wiki/Friedrich_Robert_Helmert.) This proof is long and laborious.

The proof requires the following results:

  • If $Y_i = \sum_{i=1}^{n}c_iX_i$ where $X_i \sim N(\mu_i,\sigma_i^2)$ and are the $X_i$ are independent then $Y_i \sim N(\sum_{i=1}^{n}c_i\mu_i, \sum_{i=1}^{n}c_i^2\sigma_i^2)$
  • If two Normally distributed variables are uncorrelated then they are independent
  • If a set of Normal distributed variables are pair-wise independent then they are mutually independent
  • If $U_i \sim N(0, 1)$ and the $U_i$ are independent then $\sum_{i=1}^{n}U_i \sim \chi^2(n)$

Define $U_i = (X_i - \mu)/\sigma$ and note that $E[U_i] = 0$ and $V[U_i]= 1$.

Define

$$Z_i = \sqrt{\frac{n - i}{n - i + 1}}\left(U_i - \frac{1}{n-i}\sum_{k=i+1}^nU_k\right) \text{ for } i=1, 2, ..., n-1$$

So

\begin{alignat*}{6} Z_1 & = \sqrt\frac{n-1}{n}&\bigg(&U_1 - \frac{1}{n-1}&U_2 - \frac{1}{n-1}U_3 - ... -\frac{1}{n-1}&U_{n-1}&-&\frac{1}{n-1}&U_n\bigg) \\ Z_2 & = \sqrt\frac{n-2}{n-1}&\bigg(&&U_2 - \frac{1}{n-2}U_3 - ... -\frac{1}{n-2}&U_{n-1}&-&\frac{1}{n-2}&U_n\bigg) \\ ...\\ Z_{n-1} & = \sqrt\frac{1}{2}&\bigg(&&&U_{n-1}&-&&U_n\bigg) \end{alignat*}

(I can show an easy way to derive this transformation if requested.)

We'll now show that $Z_i \sim N(0, 1)$ and that the $Z_i$ are independent.

$E[Z_i] = \sqrt{\frac{n - i}{n - i + 1}}\left(E[U_i] - \frac{1}{n-i}\sum_{k=i+1}^nE[U_k]\right) = 0$

\begin{align*} V[Z_i] & = \frac{n-i}{n-i+1}\left(V[U_i] + \left(\frac{1}{n-i}\right)^2\sum_{k=i+1}^nV[U_j]\right) \\ & = \frac{n-i}{n-i+1}\left(1 + \left(\frac{1}{n-i}\right)^2(n-i)\right) \\ & = \frac{n-i}{n-i+1}\left(\frac{n-i+1}{n-i}\right) = 1 \end{align*}

\begin{align*} Cov(Z_i, Z_j) & \text{ for } i < j\\ & = E[Z_i Z_j]\\ & = E\left[\sqrt{\frac{n - i}{n - i + 1}}\left(U_i - \frac{1}{n-i}\sum_{k=i+1}^nU_k\right) \sqrt{\frac{n - j}{n - j + 1}}\left(U_j - \frac{1}{n-j}\sum_{k=j+1}^nU_k\right)\right] \\ & = -\sqrt{\frac{(n - i)(n - j)}{(n - i + 1)(n - j + 1 )}}\left(\frac{1}{n-i}\right)\left(E[U_j^2]-\frac{1}{n-j}\sum_{k=j+1}^nE[U_k^2]\right)\\ & \text{Ignoring the cross products of the } X_i \text{ since their expectation is } 0\\ & = -\sqrt{\frac{(n - i)(n - j)}{(n - i + 1)(n - j + 1 )}}\left(\frac{1}{n-i}\right)\left(1-\frac{1}{n-j}(n-j)\right)\\ & = 0 \end{align*}

Thus the $Z_i \sim N(0,1)$ and they are independent. Hence $\sum_{i=1}^{n-1} Z_i^2 \sim \chi^2(n-1).$

Now we'll prove that $\sum_{i=1}^n (X_i - \overline{X})^2/\sigma^2 = \sum_{i=1}^{n-1} Z_i^2$. First note that $$\sum_{i=1}^{n}\left(\frac{X_i - \overline{X}}{\sigma}\right)^2 = \sum_{i=1}^{n}\left(\frac{X_i - \mu}{\sigma} - \frac{\overline{X} - \mu}{\sigma} \right)^2 = \sum_{i=1}^{n}\left(U_i - \overline{U}\right)^2 $$ So we need only show that $\sum_{i=1}^{n}\left(U_i - \overline{U}\right)^2 = \sum_{i=1}^{n-1} Z_i^2$

\begin{align*} \sum_{i=1}^n \left(U_i - \overline{U}\right)^2 &= \sum_{i=1}^n \left(U_i - \sum_{j=1}^n \frac{1}{n} U_j\right)^2 \\ &= \sum_{i=1}^n \left(\frac{n-1}{n}U_i - \frac{1}{n}\sum_{\substack{j=0 \\ j\neq i}}^n U_j\right)^2 \\ &= \frac{1}{n^2} \sum_{i=1}^n \left((n-1)U_i - \sum_{\substack{j=0 \\ j\neq i}}^n U_j\right)^2 \\ &= \frac{1}{n^2} \sum_{i=1}^n \left((n-1)^2U_i^2 - 2(n-1)U_i \sum_{\substack{j=0 \\ j\neq i}}^n U_j + (\sum_{\substack{j=0 \\ j\neq i}}^n U_j)^2 \right) \\ &= \frac{1}{n^2} \sum_{i=1}^n \left((n-1)^2U_i^2 - 2(n-1)U_i \sum_{\substack{j=0 \\ j\neq i}}^n U_j + \sum_{\substack{j=0 \\ j\neq i}}^n U_j^2 + 2\sum_{\substack{j=1 \\ j\neq i}}^{n-1} \sum_{\substack{k=j+1 \\ k\neq i}}^n U_j U_k \right) \end{align*}

What is the coefficient of $U_i$ in the sum above? In the $i$th term it's $\left(\frac{n-1}{n}\right)^2$ and in the other $n-1$ terms it's $\frac{1}{n^2}$. So in total it's $\left(\frac{n-1}{n}\right)^2 + (n-1)\frac{1}{n^2} = \frac{n-1}{n^2}\left( (n - 1) + 1 \right)= \frac{n-1}{n}.$

What is the coefficent of $U_i U_j \text{ }(i \neq j)$ ? In the $i$th and $j$th term it's $-\frac{2(n-1)}{n^2}$ and in the other $n-2$ terms it's $\frac{2}{n^2}$. So in total it's $-2\frac{2(n-1)}{n^2} + (n - 2) \frac{2}{n^2} = \frac{2}{n^2}(n - 2 - 2n + 2) = -\frac{2}{n}$.

$$\text{Thus }\sum_{i=1}^n (U_i - \overline{U})^2 = \frac{n-1}{n}\sum_{i=1}^n U_i^2 -\frac{2}{n} \sum_{i=1}^{n-1} \sum_{j=i+1}^n U_i U_j $$

Let's consider $\sum_{i=1}^{n-1} Z_i^2$ now.

\begin{align*} \sum_{i=1}^{n-1} Z_i^2 &= \sum_{i=1}^{n-1} \left( \sqrt{\frac{n-i}{n-i+1}}\left(U_i - \sum_{j=1}^n \frac{1}{n} U_j\right)\right)^2 \\ &= \sum_{i=1}^{n-1} \frac{n-i}{n-i+1} \left( U_i^2 - \frac{2}{n-i}U_i\sum_{k=i+1}^nU_k + \frac{1}{(n-i)^2} \left( \sum_{k=i+1}^nU_k \right)^2 \right) \\ &= \sum_{i=1}^{n-1} \frac{n-i}{n-i+1} \left( U_i^2 - \frac{2}{n-i}U_i\sum_{k=i+1}^nU_k + \frac{1}{(n-i)^2} \sum_{k=i+1}^nU_k^2 + \frac{2}{(n-i)^2}\sum_{j=i+1}^{n-1}\sum_{k=i+2}^nU_jU_k \right) \end{align*}

The coefficient of $U_1$ in the sum above is $\frac{n-1}{n}$ and the coefficient of $U_1U_j$ $(j > 1)$ it's $\frac{n-1}{n} \frac{-2}{n-1} = \frac{-2}{n}$.

Now consider the coefficient of $U_i$ $(i > 1)$. In the $i$th term, it's $\frac{n-i}{n-i+1}$. It's $\frac{1}{(n-j+1)(n-j)}$ in the $j$th term for $j<i$ and doesn't appear in the terms after the $i$th.

\begin{align*} \text{In total it's }\\ & \sum_{j=1}^{i-1}\frac{1}{(n-j+1)(n-j)} + \frac{n-i}{n-i+1} \\ = & \sum_{j=1}^{i-1}\left(\frac{1}{n-j} - \frac{1}{n-j+1}\right) + \frac{n-i}{n-i+1} \\ = & \sum_{j=1}^{i-1}\frac{1}{n-j} - \sum_{j=1}^{i-1}\frac{1}{n-j+1} + \frac{n-i}{n-i+1} \\ = & \sum_{j=1}^{i-1}\frac{1}{n-j} - \sum_{j=0}^{i-2}\frac{1}{n-j} + \frac{n-i}{n-i+1} \\ = & \frac{1}{n-i+1} - \frac{1}{n} + \frac{n-i}{n-i+1} \\ = & \frac{n-1}{n} \end{align*}

Finally, consider the coefficient of $U_sU_t$ where $1 < s < t < n$. In the $s$th, term it's $\frac{n-s}{n-s+1}\frac{-2}{n-s} = \frac{-2}{n-s+1}.$ In the $j$th term where $j<s$, the coefficient is $\frac{n-j}{n-j+1} \frac{2}{(n-j)^2} = \frac{2}{(n-j)(n-j+1)}.$ $U_sU_t$ doesn't appear in terms after the $s$th.

\begin{align*} \text{So in total it's }\\ & \sum_{j=1}^{s-1}\frac{2}{(n-j)(n-j+1)} + \frac{-2}{n-s+1} \\ = & 2\sum_{j=1}^{s-1}\left(\frac{1}{n-j} - \frac{1}{n-j+1}\right) + \frac{-2}{n-i+1} \\ = & 2\left(\frac{1}{n-s+1} - \frac{1}{n}\right) + \frac{-2}{n-s+1} \\ = & - \frac{2}{n} \end{align*}

We have shown $$\sum_{i=1}^{n-1} Z_i^2 = \sum_{i=1} \frac{n-1}{n}\sum_{i=1}^n U_i^2 -\frac{2}{n} \sum_{i=1}^{n-1} \sum_{j=i+1}^n U_i U_j $$.

Thus $\sum_{i=1}^n (X_i - \overline{X})^2/\sigma^2 = \sum_{i=1}^{n-1} Z_i^2 \sim \chi^2(n-1)$.