Proving the Law of the Unconscious Statistician

The proof seems a little too easy. I am wondering if I misunderstood something.

Let $Y = g(X)$. Prove the $$\mathbb E(Y) = \sum_x g(x)f_x(x)$$ provided that the sum converges absolutely.

By definition: $\mathbb E(Y) = \sum_y yf_y(y)$. Since $g^{-1}(y) = \{x_1, x_2,\dots\}$ $$f_y(y) = \mathbb P(Y=y) = \sum_i \mathbb P(X=x_i)$$ where $g(x_i) = y$. Hence $$\mathbb E(Y) = \sum_y y \sum_i \mathbb P(x_i) = \sum_y \sum_i g(x_i) \mathbb P(x_i) = \sum_i g(x_i) \mathbb P(x_i) = \sum_x g(x)f_x(x)$$ My understanding is that you need to capture every single $x_i \in g^{-1}(y)$. Repeat the process for every $y$ and then add up the terms.


The short answer is no you did not misunderstand, in fact your proof and reasoning is correct.

That being said, I would use a bit more explicit set notation to make the proof even more clear (to me at least), although it is not altogether necessary. I prefer this notation because it is more clear as to what exactly you are summing over:

We suppose $X$ has a discrete distribution on countable set $S$, and let $Q \subseteq \mathbb{R}$ denote the range of $g$. Then $Q$ is countable thus $Y$ has a discrete distribution. It follows that $$E(Y) = \sum_{y \in Q} y \cdot P(Y=y) = \sum_{y \in Q} y \sum_{x \in g^{-1}(y)} f(x) = \sum_{y \in Q} \sum_{x \in g^{-1}(y)}g(x)f(x) = \sum_{x \in S} g(x)f(x)$$


This is true in the case that $\mathbb P(X\in C)=1$ for some countable set $C\subset\mathbb R$. More generally, if $X$ is a random variable with $\mathbb E[|X|]<\infty$ and $Y=g(X)$ where $g:\mathbb R\to\mathbb R$ is a measurable function such that $\mathbb E[|g(X)|]<\infty$, we have $$F_Y(y):=\mathbb P(Y\leqslant y) = \mathbb P(g(X)\leqslant y)=\mathbb P(X\in g^{-1}(-\infty,y]) $$ for each $y\in\mathbb R$. The set $g^{-1}(-\infty,y]$ is measurable as the preimage of a measurable set under a measurable function, so we may define for all Borel sets $A$ $$\widetilde{\mathbb P}(A)=\int_A\ \mathsf d F_X(x). $$ Then $$\widetilde{\mathbb P}(\varnothing) = \int_\varnothing \mathsf dF_X(x)=0,\quad \widetilde{\mathbb P}(\mathbb R)=\int_{\mathbb R}\ \mathsf dF_X(x)=1$$ and for a disjoint sequence $\{A_n\}$ of Borel sets we have \begin{align} \widetilde{\mathbb P}\left(\bigcup_{n=1}^\infty A_n\right) &= \int_{\bigcup_{n=1}^\infty A_n} \mathsf d F_X(x)\\ &= \int_{\mathbb R}\sum_{n=1}^\infty \mathsf 1_{A_n}(x)\ \mathsf dF_X(x)\\&=\sum_{n=1}^\infty\int_{A_n}\ \mathsf dF_X(x)\\ &= \sum_{n=1}^\infty \widetilde{\mathbb P}(A_n), \end{align} so $\widetilde{\mathbb P}$ is a probability measure on $\mathcal B(\mathbb R)$. Since $$F_Y(y)=\widetilde{\mathbb P}(g^{-1}(-\infty,y])=\int_{g^{-1}(\infty,y]}\ \mathsf dF_X(x), $$ it follows that (by the definition of mathematical expectations)

$$\mathbb E[Y] = \int_{\mathbb R}\ y\ \mathsf dF_Y(y) = \int_{\mathbb R} g(x)\mathsf d\widetilde{\mathbb P}(x) = \int_{\mathbb R} g(x)\ \mathsf dF_X(x). $$


I found this pretty confusing myself until I spelled it out for myself in slow motion (using Iverson brackets to simplify the bounds on the summations):

\begin{align} \mathbb EY &= \mathbb Eg(X)\\ &= \sum_z z \cdot \mathbb P(g(X) = z) &&\text{Definition of expected value}\\ &=\sum_z z\sum_w \mathbb P(X=w)[g(w) = z] &&\text{Sum probability of basic events s.t. $g(w) = z$}\\ &=\sum_z\sum_w z\cdot \mathbb P(X=w)[g(w)=z] &&\text{Distribute the $z$}\\ &=\sum_w\sum_z z\cdot \mathbb P(X=w)[g(w)=z] &&\text{Associativity: swap order of summation}\\ &=\sum_w\sum_{z = g(w)}g(w)\cdot \mathbb P(X=w) &&\text{Non-zero inner sum occurs when $z = g(w)$}\\ &=\sum_w g(w)\cdot \mathbb P(X=w)\sum_{z = g(w)}1 &&\text{Inner product depends only on $w$; factor it}\\ &=\sum_w g(w)\cdot \mathbb P(X=w) &&\text{$z$ takes on one value so the inner sum is trivially 1} \end{align}