Linearity of expectations - Why does it hold intuitively even when the r.v.s are correlated?

An experiment - say rolling a die, is performed a large number of times, $n$. Let $X$ and $Y$ be two random variables that summarize this experiment.

Intuitively(by the law of large numbers), if I observe the values of $X$, over a large number of trials, take their mean, $m_{X}=\frac{1}{n}\sum_{i}{x_{i}}$, and observe the values of $Y$, take their mean $m_{Y}=\frac{1}{n}\sum_{i}{y_{i}}$ and the add the two column means, this is very close to $E(X)+E(Y)$.

If we observe the values of $X+Y$ in a third column, and take their arithmetic mean, $m_{X+Y}$, this will be very close to $E(X+Y)$.

Therefore, linearity of expectation, that $E(X+Y)=E(X)+E(Y)$ emerges as a simple fact of arithmetic (we're just adding two numbers in different orders).

I know linearity of expectations holds, even when the $X$ and $Y$ are dependent. For example, the binomial and hypergeometric expectation is $E(X)=np$, although in the binomial story, the $Bern(p)$ random variables are i.i.d., but in the hypergeometric story, they are dependent.

If two random variables are correlated, wouldn't that affect the average of their sum, than if they were uncorrelated? Any insight or intuition would be great!


For intuition, suppose the sample space consists of a finite number of equally probable outcomes (this is of course not true for all probability spaces, but many situations can be approximated by something of this form). Then $$ E(X+Y) = \frac{(x_1+y_1)+(x_2+y_2)+\cdots+(x_n+y_n)}n $$ and $$ E(X)+E(Y) = \frac{x_1+x_2+\cdots+x_n}n + \frac{y_1+y_2+\cdots+y_n}n $$ which is obviously the same.


Say you have $X$ and $Y$ independent and then you turn up the correlation. Say they're mean zero too, just for simplicity. Then $X+Y$ will still be positive just as often on average as it's negative. It's just that it will be more likely that X and Y are positive or negative together. Thus the mean of $X+Y$ stays zero. However, it does increase the variance since $X+Y$ will tend to be larger in magnitude cause $X$ and $Y$ have the same sign more often.


If two random variables are correlated, wouldn't that affect the average of their sum, than if they were uncorrelated?

Being correlated or uncorrelated matters when we have $\mathsf{E}(XY)$ terms. So for uncorrelated RVs, $\mathsf{E}(XY)=\mathsf{E}(X)\mathsf{E}(Y)$. Obviously, when we consider expectation of sum of RVs, or sum of expectations, no such terms appear. Hence being correlated or not does not change anything.


Here's my non-rigorous attempt:

We know by the law of total expectation that $$E[X] = E[E[X|Y]]$$

or in a special case, intuitively if $A_i$s partition the sample space $$E[X] = \sum_i E[X|A_i]P(A_i)$$

That means even when $X$ is dependent on $Y$, $E[X]$ already knows about and has accounted for this dependency! How $X$ and $Y$ affect each other were known to $E[X]$ and $E[Y]$.

For reference, another good answer.