Can we prove the law of total probability for continuous distributions?

Solution 1:

Think of it like this: Suppose you have a continuous random variable $X$ with pdf $f(x)$. Then $P(A)=E(1_{A})=E[E(1_{A}|X)]=\int E(1_{A}|X=x)f(x)dx=\int P(A|X=x)f(x)dx$.

Solution 2:

Excellent question. The issue here is that you first have to define what $\mathbb{P}(A|X=x)$ means, as you're conditioning on the event $[X=x]$, which has probability zero if $X$ is a continuous random variable. Can we still give $\mathbb{P}(A|X=x)$ a meaning? In the words of Kolmogorov,

"The concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible."

The problem with conditioning on a single event of probability zero is that it can lead to paradoxes, such as the Borel-Kolmogorov paradox. However, if we don't just have an isolated hypothesis such as $[X=x]$, but a whole partition of hypotheses $\{[X=x] ~|~ x \in \mathbb{R}\}$ with respect to which our notion of conditional probability is supposed to make sense, we can give a meaning to $\mathbb{P}(A|X=x)$ for almost every $x$. Let's look at an important special case.


Continuous random variables in Euclidean space

In many instances where we might want to apply the law of total probability for continuous random variables, we are actually interested in events of the form $A = [(X,Y) \in B]$ where $B$ is a Borel set and $X,Y$ are random variables taking values in $\mathbb{R}^d$ which are absolutely continuous with respect to Lebesgue measure. For simplicity, I will assume here that $X,Y$ take values in $\mathbb{R}$, although the multivariate case is completely analogous. Choose a representative of $f_{X,Y}$, the density of $(X,Y)$, and a representative of $f_X$, the density of $X$, then the conditional density of $Y$ given $X$ is defined as $$ f_{Y|X}(x,y) = \frac{f_{X,Y}(x,y)}{f_{X}(x)}$$ at all points $(x,y)$ where $f(x) > 0$. We may then define for $A = [(X,Y) \in B]$ and $B_x := \{ y \in \mathbb{R} : (x,y) \in B\}$

$$\mathbb{P}(A | X = x) := \int_{B_x}^{} f_{Y|X}(x,y)~\mathrm{d}y, $$ at least at all points $x$ where $f(x) > 0$. Note that this definition depends on the choice of representatives we made for the densities $f_{X,Y}$ and $f_{X}$, and we should keep this in mind when trying to interpret $P(A|X=x)$ pointwise. Whichever choice we made, the law of total probability holds as can be seen as follows:

\begin{align*} \mathbb{P}(A) &= \mathbb{E}[1_{B}(X,Y)] = \int_{B} f_{X,Y}(x,y)~\mathrm{d}y~\mathrm{d}x = \int_{-\infty}^{\infty}\int_{B_x} f_{X,Y}(x,y)~\mathrm{d}y~\mathrm{d}x \\ &= \int_{-\infty}^{\infty}f_{X}(x)\int_{B_x} f_{Y|X}(x,y)~\mathrm{d}y~\mathrm{d}x = \int_{-\infty}^{\infty}\mathbb{P}(A|X=x)~ f_X(x)~\mathrm{d}x. \end{align*}

One can convince themselves that this construction gives us the properties we would expect if, for example, $X$ and $Y$ are independent, which should give us some confidence that this notion of conditional probability makes sense.


Disintegrations

The more general name for the concept we dealt with in the previous paragraph is disintegration. In complete generality, disintegrations need not exist, however if the probability space $\Omega$ is a Radon space equipped with its Borel $\sigma$-field, they do. It might seem off-putting that the topology of the probability space now comes into play, but I believe for most purposes it will not be a severe restriction to assume that the probability space is $([0,1],\mathcal{B},\lambda)$, that is, $[0,1]$ equipped with the Euclidean topology, Borel $\sigma$-field and Lebesgue measure. Any random variable $X$ can then be understood as $X(\omega) = F^{-1}(\omega)$, where $F^{-1}$ is the generalized inverse of the cumulative distribution function of $X$. The disintegration theorem then gives us the existence of a family of measures $(\mu_x)_{x \in \mathbb{R}}$, where $\mu_x$ is supported on the event $[X=x]$, and the family $(\mu_x)_{x\in \mathbb{R}}$ is unique up to $\text{law}(X)$-almost everywhere equivalence. Writing $\mu_x$ as $\mathbb{P}(\cdot|X=x)$, in particular, for any Borel set $A \in \mathcal{B}$ we then again have

$$\mathbb{P}(A) = \int_{-\infty}^{\infty} \mathbb{P}(A|X=x)~f_X(x)~\mathrm{d}x.$$


Reference for Kolmogorov quote:

Kolmogoroff, A., Grundbegriffe der Wahrscheinlichkeitsrechnung., Ergebnisse der Mathematik und ihrer Grenzgebiete 2, Nr. 3. Berlin: Julius Springer. IV + 62 S. (1933). ZBL59.1152.03.>