Why does $ \operatorname{Var}(X) = E[X^2] - (E[X])^2 $

$ \operatorname{Var}(X) = E[X^2] - (E[X])^2 $

I have seen and understand (mathematically) the proof for this. What I want to understand is: intuitively, why is this true? What does this formula tell us? From the formula, we see that if we subtract the square of expected value of x from the expected value of $ x^2 $, we get a measure of dispersion in the data (or in the case of standard deviation, the root of this value gets us a measure of dispersion in the data).

So it seems that there is some linkage between the expected value of $ x^2 $ and $ x $. How do I make sense of this formula? For example, the formula

$$ \sigma^2 = \frac 1n \sum_{i = 1}^n (x_i - \bar{x})^2 $$

makes perfect intuitive sense. It simply gives us the average of squares of deviations from the mean. What does the other formula tell us?


Solution 1:

Some times ago, a professor showed me this right triangle:

enter image description here

The formula you reported can be seen as the application of the Phytagora's theorem:

$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X].$$

Here, $P = \mathbb{E}^2[X]$ (which is the second uncentered moment of $X$) is read as the "average power" of $X$. Indeed, there is a physical explanation.

In physics, energy and power are related to the "square" of some quantity (i.e. $X$ can be velocity for kinetic energy, current for Joule law, etc.).

Suppose that these quantities are random (indeed, $X$ is a random variable). Then, the average power $P$ is the sum of two contribution:

  1. The square of the expected value of $X$;
  2. Its variance (i.e. how much it varies from the expected value).

It is clear that, if $X$ is not random, then $\text{Var}[X] = 0$ and $\mathbb{E}^2[X] = X^2$, so that:

$$P = X^2,$$

which is a typical physical definition of energy/power (in this case it is exact, it is not an average). When randomness is present, the we must use the whole formula

$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X]$$

to evaluate the average power of the signal.

As a final remark, the average power of $X$ can be seen as the length of the vector which components corresponds to the square of its expected value plus its variability.


P.S. A further clarification... the values $P$, $\text{Var}[X]$ and $\mathbb{E}^2[X]$ represent the squares of the sides of the triangle, not their length...

Solution 2:

Easy! Expand by the definition. Variance is the mean squared deviation, i.e., $V(X) = E((X-\mu)^2).$ Now:

$$ (X-\mu)^2 = X^2 - 2X \mu + \mu^2$$

and use the fact that $E(\cdot)$ is a linear function and that $\mu$ (the mean) is a constant.

The shortcut computes the same thing, but counts the difference in the mean of squares and the square of the mean.

Solution 3:

The other formula tells you exactly the same thing as the one that you have given with $x,x^2$ $\&$ $n$. You say you understand this formula so I assume that you also get that variance is just the average of all the deviations squared.

Now, $\mathbb{E}(X)$ is just the average of of all $x’_is$, which is to say that it is the mean of all $x’_is$.

Let us now define a deviation using the expectation operator. $$Deviation = D = (X-\mathbb{E}(X))$$ And Deviation squared is, $$D^2 = (X-\mathbb{E}(X))^2$$

Now that we have deviation let’s find the variance. Using the above mentioned definition of variance, you should be able to see that

$$Variance = \mathbb{E}(D^2)$$ Since $\mathbb{E}(X)$ is the average value of $X$,The above equation is just the average of deviations squared.

Putting the value of $D^2$, we get, $$Var(X) = \mathbb{E}(X-\mathbb{E}(X))^2 = \mathbb{E}(X^2+\mathbb{E}(X)^2-2X*\mathbb{E}(X)) = \mathbb{E}(X^2)+\mathbb{E}(X)^2-2\mathbb{E}(X)^2 = \mathbb{E}(X^2)-\mathbb{E}(X)^2$$ Hope this helps.