Definition of mean as an integral over the CDF

The former integral is a Stieltjes integral. See this, for example, in particular the section "Application to probability theory".

In brief, the integral $I_1 = \int {xdF(x)} $ is a generalization of $I_2 = \int {xf(x)dx} $. $I_2$ can be used only when the distribution is absolutely continuous, that is, has a probability density function. $I_1$ can be used for any distribution, even if it is discrete or continuous singular (provided the expectation is well-defined, that is $\int {|x|dF(x)} < \infty $). Note that if $F'(x) = f(x)$, then $\frac{d}{{dx}}F(x) = f(x)$ gives rise to $dF(x) = f(x)dx$.

For a thorough account of this topic, see, for example, this.

EDIT:

An example. Suppose that a random variable $X$ has CDF $F$ such that $F'(x) = f_1 (x)$ for $x < a$, $F'(x) = f_2 (x)$ for $x > a$, and $F(a) - F(a-) = p$, where $f_1$ and $f_2$ are nonnegative continuous functions, and $0 < p \leq 1$. Note that $F$ is everywhere differentiable except at $x=a$ where it has a jump discontinuity of size $p$. In particular, $X$ does not have a PDF $f$, hence you cannot compute ${\rm E}(X)$ as the integral $I_2$. However, ${\rm E}(X)$ can be computed as follows: $$ {\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = \int_{ - \infty }^a {xf_1 (x)dx} + a[F(a) - F(a - )] + \int_a^\infty {xf_2 (x)dx} . $$ In case $f_1$ and $f_2$ are identically zero and, hence, $p=1$, this gives $$ {\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = a[F(a) - F(a - )] = a, $$ which is obvious since $X$ has the $\delta_a$ distribution (that is, ${\rm P}(X=a)=1$).

Exercise 1. Suppose that $X$ takes finitely many values $x_1,\ldots,x_n$, with probabilities $p_1,\ldots,p_n$, respectively. Conclude that ${\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = \sum\nolimits_{k = 1}^n {x_k p_k } $.

Exercise 2. Suppose that with probability $0 < p < 1$ a random variable $X$ takes the value $a$, and with probability $1-p$ it is uniform$[0,1]$. Find the CDF $F$ of $X$, and compute its expectation (note that $X$ is neither discrete nor continuous random variable).

Note: The integral $I_1$ is, of course, a special case of $\int {h(x)dF(x)} $ (say, for $h$ a continuous function). In particular, letting $h=1$, we have $\int {dF(x)} = 1$ (which is a generalization of $\int {f(x)dx} = 1$).

EDIT: Further details.

Note that if $h$ is continuous, then ${\rm E}[h(X)]$, if exists, is given by $$ {\rm E}[h(X)] = \int {h(x)dF(x)}. $$ In particular, if the $n$-th moment exists, it is given by $$ {\rm E}[X^n] = \int {x^n dF(x)}. $$

In principle, the limits of integration range from $-\infty$ to $\infty$. In this context, consider the following important example. Suppose that $X$ is any nonnegative random variable. Then its Laplace transform is $$ {\rm E}[e^{ - sX} ] = \int {e^{ - sx} dF(x)} = \int_{ 0^- }^\infty {e^{ - sx} dF(x)}, \;\; s \geq 0, $$ where $0^-$ can be replaced by $-\varepsilon$ for any $\varepsilon > 0$. While the $n$-th moment of (the nonnegative) $X$ is given, for any $n \geq 1$, by $$ {\rm E}[X^n] = \int_{ 0^- }^\infty {x^n dF(x)} = \int_0^\infty {x^n dF(x)}, $$ it is not true in general that also $$ {\rm E}[e^{ - sX} ] = \int_{0 }^\infty {e^{ - sx} dF(x)}. $$ Indeed, following the definition of the integral, $$ {\rm E}[e^{ - sX} ] = \int_{ 0^- }^\infty {e^{ - sx} dF(x)} = e^{-s0}[F(0)-F(0-)] + \int_{ 0 }^\infty {e^{ - sx} dF(x)}, $$ hence the jump of $F$ at zero should be added (if positive, of course). In the $n$-th moment case, on the other hand, the corresponding term is $0^n[F(0)-F(0-)] = 0$, hence a jump of $F$ at zero does not affect the overall integral.


The cumulative density function $F$ is related to the probability density function $f$ by $dF(x)/dx=f(x)$. The equation you have in terms of $F$ can be re-expressed in terms of $f$ by substituting in $dF(x) = f(x)\,dx$. In fact, for many purposes, you can take this as the definition of the differential term $dF$. However, in more general circumstances where $F$ is not differentiable and the PDF $f$ is not well-defined, the form involving $dF$ still holds (interpreting it as a Riemann-Stieltjes integral). For example, if the distribution is discrete, so that $F$ is piecewise constant, then $dF$ becomes a sum over Dirac distributions.