(Basic) Confusion about usage of covariance formula

What is the intuition behind this expected covariance formula? Why we do not use the first one (first line) and we use the last one. E[X] and E[Y] are means and easy to find. Why we derive the last equation. I do not get the idea.

enter image description here


Solution 1:

This is not about "expected" covariance, but simply about covariance.

In some contexts it is a very bad idea to use this formula. For example, suppose \begin{align} & \operatorname E(X) = 1\,000\,000, \\[2pt] & \operatorname E(Y) = 2\,000\,000, \\[2pt] & \operatorname{sd}(X) = 0.03, \\[2pt] & \operatorname{sd}(Y) = 0.02, \\[2pt] & \operatorname{corr}(X,Y) = 0.9. \end{align} Then we have $\operatorname{cov}(X,Y) = 0.03\times0.02\times0.9 = 0.00054$ and so $$\operatorname E(XY) = \operatorname{cov}(X,Y) + \operatorname E(X)\operatorname E(Y) = 0.00054 + 2\,000\,000\,000\,000.00054. $$ So what happens when you try to use this formula then? Watch: \begin{align} \operatorname{cov}(X,Y) & = \operatorname E(XY) - \operatorname E(X) \operatorname E(Y) \\[4pt] & = \underbrace{2\,000\,000\,000\,000}_\text{rounded} {} - 2\,000\,000\,000\,000 \\[15pt] & = 0. \quad \text{So all of the desired information was lost in rounding.} \end{align}

But if you have something like $\operatorname E(X)=2$ and $\operatorname{sd}(X)=3$ and $\operatorname E(Y)=4$ and $\operatorname{sd}(Y)=8,$ then sometimes doing the arithmetic is a bit quicker with this formula, so it gets called a shortcut.