Reconciling measure-theoretic definition of expectation versus expectation defined in elementary probability

If $X : \Omega \to \mathbb{R}$ is a random variable, it induces the probability measure on $\mathbb{R}$, often denoted as $P(X \in \cdot)$, defined as the map

$$ X \mapsto P(X \in A) $$

for $A \in \mathcal{B}(\mathbb{R})$. This is called the pushforward measure of $P$ by $X$. In this particular case, it is the same as the Stieltjes measure induced by $F_X(\cdot) = \mathbb{P}(X \leq \cdot)$, and so, we may interchangeably write

$$ \int_{\mathbb{R}} f(x) \, \mathrm{d}F_X(x) = \int_{\mathbb{R}} f(x) \, P(X \in \mathrm{d}x). $$

Under this setting, we have the following theorem. (You may also refer to Theorem 1.6.9 of Durrett 4.1Ed.)

Theorem (Change of variables) For any random variable $X : \Omega \to \mathbb{R}$ and for any Borel-measurable $f : \mathbb{R} \to [0, \infty]$, the following identity holds:

$$ \int_{\Omega} f(X(\omega)) \, P(\mathrm{d}\omega) = \int_{\mathbb{R}} f(x) \, P(X \in \mathrm{d}x) $$

Of course, the same conclusion continues to hold when $f$ is any $\mathbb{R}$-valued Borel-measurable function such that $|f(X)|$ is integrable. This easily follows by decomposing $f$ as $f_+ - f_-$, where $f_+$ (resp. $f_-$) is the positive part (resp. negative part) of $f$ and applying the above theorem to $f_{\pm}$ separately. In particular, we get

$$ E[X] = \int_{\mathbb{R}} x \, P(X \in \mathrm{d}x). $$

Here are some special cases:

  • $X$ has discrete distribution if and only if $P(X \in \cdot) = \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\cdot)$, where $p_X$ is the PMF of $X$ and $\delta_x$ is the point mass at $x$. In such case, for $f \geq 0$,

    $$ E[X] = \int_{\mathbb{R}} x \, \sum_{i=1}^{\infty} p_X(x_i) \delta_{x_i}(\mathrm{d}x) = \sum_{i=1}^{\infty} \left( \int_{\mathbb{R}} x \delta_{x_i}(\mathrm{d}x) \right) p_X(x_i) = \sum_{i=1}^{\infty} x_i p_X(x_i) $$

  • $X$ has continuous distribution if and only if $P(X \in \mathrm{d}x) = f_X(x) \, \mathrm{d}x $, where $f_X$ is the PDF of $X$. In such case, for $f \geq 0$,

    $$ E[X] = \int_{\mathbb{R}} x f_X(x) \, \mathrm{d}x $$