Why does $ \operatorname{Var}(X) = E[X^2] - (E[X])^2 $
$ \operatorname{Var}(X) = E[X^2] - (E[X])^2 $
I have seen and understand (mathematically) the proof for this. What I want to understand is: intuitively, why is this true? What does this formula tell us? From the formula, we see that if we subtract the square of expected value of x from the expected value of $ x^2 $, we get a measure of dispersion in the data (or in the case of standard deviation, the root of this value gets us a measure of dispersion in the data).
So it seems that there is some linkage between the expected value of $ x^2 $ and $ x $. How do I make sense of this formula? For example, the formula
$$ \sigma^2 = \frac 1n \sum_{i = 1}^n (x_i - \bar{x})^2 $$
makes perfect intuitive sense. It simply gives us the average of squares of deviations from the mean. What does the other formula tell us?
Solution 1:
Some times ago, a professor showed me this right triangle:
The formula you reported can be seen as the application of the Phytagora's theorem:
$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X].$$
Here, $P = \mathbb{E}^2[X]$ (which is the second uncentered moment of $X$) is read as the "average power" of $X$. Indeed, there is a physical explanation.
In physics, energy and power are related to the "square" of some quantity (i.e. $X$ can be velocity for kinetic energy, current for Joule law, etc.).
Suppose that these quantities are random (indeed, $X$ is a random variable). Then, the average power $P$ is the sum of two contribution:
- The square of the expected value of $X$;
- Its variance (i.e. how much it varies from the expected value).
It is clear that, if $X$ is not random, then $\text{Var}[X] = 0$ and $\mathbb{E}^2[X] = X^2$, so that:
$$P = X^2,$$
which is a typical physical definition of energy/power (in this case it is exact, it is not an average). When randomness is present, the we must use the whole formula
$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X]$$
to evaluate the average power of the signal.
As a final remark, the average power of $X$ can be seen as the length of the vector which components corresponds to the square of its expected value plus its variability.
P.S. A further clarification... the values $P$, $\text{Var}[X]$ and $\mathbb{E}^2[X]$ represent the squares of the sides of the triangle, not their length...
Solution 2:
Easy! Expand by the definition. Variance is the mean squared deviation, i.e., $V(X) = E((X-\mu)^2).$ Now:
$$ (X-\mu)^2 = X^2 - 2X \mu + \mu^2$$
and use the fact that $E(\cdot)$ is a linear function and that $\mu$ (the mean) is a constant.
The shortcut computes the same thing, but counts the difference in the mean of squares and the square of the mean.
Solution 3:
The other formula tells you exactly the same thing as the one that you have given with $x,x^2$ $\&$ $n$. You say you understand this formula so I assume that you also get that variance is just the average of all the deviations squared.
Now, $\mathbb{E}(X)$ is just the average of of all $x’_is$, which is to say that it is the mean of all $x’_is$.
Let us now define a deviation using the expectation operator. $$Deviation = D = (X-\mathbb{E}(X))$$ And Deviation squared is, $$D^2 = (X-\mathbb{E}(X))^2$$
Now that we have deviation let’s find the variance. Using the above mentioned definition of variance, you should be able to see that
$$Variance = \mathbb{E}(D^2)$$ Since $\mathbb{E}(X)$ is the average value of $X$,The above equation is just the average of deviations squared.
Putting the value of $D^2$, we get, $$Var(X) = \mathbb{E}(X-\mathbb{E}(X))^2 = \mathbb{E}(X^2+\mathbb{E}(X)^2-2X*\mathbb{E}(X)) = \mathbb{E}(X^2)+\mathbb{E}(X)^2-2\mathbb{E}(X)^2 = \mathbb{E}(X^2)-\mathbb{E}(X)^2$$ Hope this helps.