What does it mean to integrate with respect to the distribution function?

If $f(x)$ is a density function and $F(x)$ is a distribution function of a random variable $X$ then I understand that the expectation of x is often written as:

$$E(X) = \int x f(x) dx$$

where the bounds of integration are implicitly $-\infty$ and $\infty$. The idea of multiplying x by the probability of x and summing makes sense in the discrete case, and it's easy to see how it generalises to the continuous case. However, in Larry Wasserman's book All of Statistics he writes the expectation as follows:

$$E(X) = \int x dF(x)$$

I guess my calculus is a bit rusty, in that I'm not that familiar with the idea of integrating over functions of $x$ rather than just $x$.

  • What does it mean to integrate over the distribution function?
  • Is there an analogous process to repeated summing in the discrete case?
  • Is there a visual analogy?

UPDATE: I just found the following extract from Wasserman's book (p.47):

The notation $\int x d F(x)$ deserves some comment. We use it merely as a convenient unifying notation so that we don't have to write $\sum_x x f(x)$ for discrete random variables and $\int x f(x) dx$ for continuous random variables, but you should be aware that $\int x d F(x)$ has a precise meaning that is discussed in a real analysis course.

Thus, I would be interested in any insights that could be shared about what is the precise meaning that would be discussed in a real analysis course?


Solution 1:

There are many definitions of the integral, including the Riemann integral, the Riemann-Stieltjes integral (which generalizes and expands upon the Riemann integral), and the Lebesgue integral (which is even more general.) If you're using the Riemann integral, then you can only integrate with respect to a variable (e.g. $x$), and the notation $dF(x)$ isn't defined.

The Riemann-Stieltjes integral generalizes the concept of the Riemann integral and allows for integration with respect to a cumulative distribution function that isn't continuous.

The notation $\int_{a}^{b} g(x)dF(x)$ is roughly equivalent of $\int_{a}^{b} g(x) f(x) dx$ when $f(x)=F'(x)$. However, if $F(x)$ is a function that isn't differentiable at all points, then you simply can't evaluate $\int_{a}^{b} g(x) f(x) dx$, since $f(x)=F'(x)$ isn't defined.

In probability theory, this situation occurs whenever you have a random variable with a discontinuous cumulative distribution function. For example, suppose $X$ is $0$ with probability $\frac{1}{2}$ and $1$ with probability $\frac{1}{2}$. Then

$$ \begin{align} F(x) &= 0 & x &< 0 \\ F(x) &= 1/2 & 0 &\leq x < 1 \\ F(x) &= 1 & x &\geq 1 \\ \end{align} $$

Clearly, $F(x)$ doesn't have a derivative at $x=0$ or $x=1$, so there isn't a probability density function $f(x)$ at those points.

Now, suppose that we want to evaluate $E[X^3]$. This can be written, using the Riemann-Stieltjes integral, as

$$E[X^3]=\int_{-\infty}^{\infty} x^3 dF(x).$$

Note that because there isn't a probability density function $f(x)$, we can't write this as

$$E[X^{3}]=\int_{-\infty}^{\infty} x^3 f(x) dx.$$

However, we can use the fact that this random variable is discrete to evaluate the expected value as:

$$E[X^{3}]=(0)^{3}(1/2)+(1)^{3}(1/2)=1/2$$

So, the short answer to your question is that you need to study alternative definitions of the integral, including the Riemann and Riemann-Stieltjes integrals.

Solution 2:

Another way to understand integration with respect to a distribution function is via the Lebesgue-Stieltjes measure. Let $F\!:\mathbb R\to\mathbb R$ be a distribution function (i.e. non-decreasing and right-continuous). Then there exists a unique measure $\mu_F$ on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ that satisfies $$ \mu_F((a,b])=F(b)-F(a) $$ for any choice of $a,b\in\mathbb R$ with $a<b$. Actually there is a one-to-one correspondance between probability measures on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ and non-decreasing, right-continuous functions $F\!:\mathbb R\to\mathbb R$ satisfying $F(x)\to 1$ for $x\to\infty$ and $F(x)\to 0$ for $x\to-\infty$.

Now, the integral $$ \int x\,\mathrm dF(x) $$ can be viewed as simply the integral $$ \int x\,\mu_F(\mathrm dx)\quad\text{or}\quad \int x \,\mathrm d\mu_F(x). $$

Now if $X$ is a random variable having distribution function $F$, then the Lebesgue-Stieltjes measure is nothing but the distribution $P_X$ of $X$: $$ P_X((a,b])=P(X\in (a,b])=P(X\leq b)-P(X\leq a)=F(b)-F(a)=\mu_F((a,b]),\quad a<b, $$ showing that $P_X=\mu_F$. In particular we see that $$ {\rm E}[X]=\int_\Omega X\,\mathrm dP=\int_\mathbb{R}x\,P_X(\mathrm dx)=\int_\mathbb{R}x\,\mu_F(\mathrm dx)=\int_\mathbb{R}x\,\mathrm dF(x). $$

Solution 3:

The integral is in the sense of Riemann-Stieltjes. The definition can be found in the link but loosely put it is defined something like:

$$\int_a^b g(x)dF(x)=\lim_{P\rightarrow 0} \sum_{k=1}^{n-1} g(x_k)[F(x_{k+1})-F(x_k)],$$

where $x_i$ partition the interval you are integrating over, $[a,b]$, and the mesh size goes to 0, in that $P:=\{x_0=a,x_1,\ldots, x_{n-1},x_n=b\}$ and $x_i-x_{i-1}\rightarrow 0$ for every $i$. The point of this definition is that $F(x_{k+1})-F(x_k)$ encapsulates the probability of being within the interval $(x_{i},x_{i+1}]$. When $F$ is differentiable, you can show that $dF(x)=f(x)dx$, in that the integral becomes the usual Riemann integral. However, when $F$ is not differentiable, particularly when $F$ experiences a jump (which is equivalent to your random variable taking on a single value with positive probability), you need this generalization of the integral. For example, if $X$ is a constant random variable, say $X=c$, then $F(x)$ jumps from 0 to 1 at $x=c$ and so $X$ doesn't have a density function in the classical sense but rather a point mass (that is, a Dirac Delta functional).