Formal definition of conditional probability

Let throughout this post $(\Omega,\mathcal{F},P)$ be a probability space, and let us first define the conditional expectation ${\rm E}[X\mid\mathcal{G}]$ for integrable random variables $X:\Omega\to\mathbb{R}$, i.e. $X\in L^1(P)$, and sub-sigma-algebras $\mathcal{G}\subseteq\mathcal{F}$.

Definition: The conditional expectation ${\rm E}[X\mid\mathcal{G}]$ of $X$ given $\mathcal{G}$ is the random variable $Z$ having the following properties:

(i) $Z$ is integrable, i.e. $Z\in L^1(P)$.

(ii) $Z$ is ($\mathcal{G},\mathcal{B}(\mathbb{R}))$-measurable.

(iii) For any $A\in\mathcal{G}$ we have $$ \int_A Z\,\mathrm dP=\int_A X\,\mathrm dP. $$

Note: It makes sense to talk about the conditional expectation since if $U$ is another random variable satisfying (i)-(iii) then $U=Z$ $P$-a.s.

Definition: If $X\in L^1(P)$ and $Y:\Omega\to\mathbb{R}$ is any random variable, then the conditional expectation of $X$ given $Y$ is defined as $$ {\rm E}[X\mid Y]:={\rm E}[X\mid\sigma(Y)], $$ where $\sigma(Y)=\{Y^{-1}(B)\mid B\in\mathcal{B}(\mathbb{R})\}$ is the sigma-algebra generated by $Y$.

I'm not aware of any other definition of $P(Y\in B\mid X\in A)$ than the obvious, i.e. $$ P(Y\in B\mid X\in A)=\frac{P(Y\in B,X\in A)}{P(X\in A)} $$ provided that $P(X\in A)>0$. The only exception being when $A$ contains a single point, i.e. $A=\{x\}$ for some $x\in\mathbb{R}$. In this case, the object $P(Y\in B\mid X=x)$ is defined in terms of a regular conditional distribution.

Let us first define regular conditional probabilities. Let $X:\Omega\to\mathbb{R}$ be a random variable.

Definition: A regular conditional probability for $P$ given $X$ is a function $$ \mathcal{F}\times \mathbb{R} \ni(A,x)\mapsto P^X(A\mid x) $$ satisfying the following three conditions:

(i) The mapping $A\mapsto P^X(A\mid x)$ is a probability measure on $(\Omega,\mathcal{F})$ for all $x\in \mathbb{R}$.

(ii) The mapping $x\mapsto P^X(A\mid x)$ is $(\mathcal{B}(\mathbb{R}),\mathcal{B}(\mathbb{R}))$-measurable for all $A\in\mathcal{F}$.

(iii) The defining equation holds: For any $A\in\mathcal{F}$ and $B\in\mathcal{B}(\mathbb{R})$ we have $$ \int_B P^X(A\mid x)\,P_X(\mathrm dx)=P(A\cap\{X\in B\}). $$

Note: A mapping satisfying (i) and (ii) is often called a Markov kernel. Furthermore, since $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ is a nice space, the regular conditional probability is unique in the sense that if $\tilde{P}^X(\cdot\mid\cdot)$ is another regular conditional probability of $P$ given $X$, then we have that $P^X(\cdot\mid x)=\tilde{P}^X(\cdot\mid x)$ for $P_X$-a.a. $x$. Here $P_X=P\circ X^{-1}$ is the distribution of $X$.

Connection: Let $P^X(\cdot\mid\cdot)$ be a regular conditional probability of $P$ given $X$. Then for any $A\in\mathcal{F}$ we have $$ {\rm E}[1_A\mid X]=\varphi(X), $$ where $\varphi(x)=P^X(A\mid x)$. In short we write ${\rm E}[1_A\mid X]=P^X(A\mid X)$.

Now let us introduce another random variable $Y:\Omega\to\mathbb{R}$, and $P^X(\cdot\mid \cdot)$ still denotes a regular conditional probability of $P$ given $X$.

Definition: For $B\in\mathcal{B}(\mathbb{R})$ and $x\in\mathbb{R}$ we define the regular conditional distribution of $Y$ given $X$ by $$ P_{Y\mid X}(B\mid x):=P^X(Y\in B\mid x). $$

Instead of $P_{Y\mid X}(B\mid x)$ one often writes $P(Y\in B\mid X=x)$.

An easy consequence of this definition is that $(B,x)\mapsto P_{Y\mid X}(B\mid x)$ is a Markov kernel and for any $A,B\in\mathcal{B}(\mathbb{R})$ we have $$ \int_A P_{Y\mid X}(B\mid x)\,P_X(\mathrm dx)=P(\{X\in A\}\cap\{Y\in B\}). \tag{1} $$

In fact, $P_{Y\mid X}(\cdot \mid \cdot)$ is a regular conditional distribution of $Y$ given $X$ if and only if $P_{Y\mid X}(\cdot\mid\cdot)$ is a Markov kernel and satisfies $(1)$. Again $(1)$ is often referred to as the defining equation.

Definition: Let $P^X(\cdot\mid\cdot)$ be a regular conditional probability of $P$ given $X$. Furthermore, let $U:\Omega\to\mathbb{R}$ be another random variable that is assumed bounded (to ensure the following expectations exist). Then we define the (regular) conditional mean of $U$ given $X=x$ by $$ {\rm E}[U\mid X=x]:=\int_\Omega U(\omega)\, P^X(\mathrm d\omega\mid x). $$

Let us denote $\psi(x)={\rm E}[U\mid X=x]$. Then we have the following:

Connection: The mapping $\mathbb{R}\ni x\mapsto \psi(x)$ is $(\mathcal{B}(\mathbb{R}),\mathcal{B}(\mathbb{R}))$-measurable, and $$ {\rm E}[U\mid X]=\psi(X). $$

The following is an extremely useful rule when calculating with conditional distributions:

Rule: Let $X$ and $Y$ be as above, and let $\xi:\mathbb{R}^2\to\mathbb{R}$ be $(\mathcal{B}(\mathbb{R}^2),\mathcal{B}(\mathbb{R}))$-measurable. Then $$ P(\xi(X,Y)\in D\mid X=x)=P(\xi(x,Y)\in D\mid X=x),\quad D\in\mathcal{B}(\mathbb{R}), $$ holds for $P_X$-a.a. $x$. This is saying that "conditional on $X=x$ we may replace $X$ by $x$".

The following example shows how this rule can be useful: Let $X$ and $Y$ be independent $\mathcal{N}(0,1)$ random variables, and let $U=X+Y$. Then we claim that $U\mid X=x\sim \mathcal{N}(x,1)$ for $P_X$-a.a. $x$. To see this, note that by the rule above, the distribution of $U\mid X=x$ and $Y+x\mid X=x$ is the same. But since $Y$ is independent of $X$ we have that $Y+x\mid X=x$ is distributed as $Y+x$. We can write it as follows: $$ U\mid X=x\sim Y+x\mid X=x\sim Y+x\sim\mathcal{N}(x,1). $$

Formal definition of conditional probability

Related

Recent Posts