Let $(\Omega,\mathcal{F},\mathbb{P})$ be a complete probability space. Let $\mathcal{A}$ be a complete sub-$\sigma$-algebra of $\mathcal{F}$. For the moment assume that $X$ is a random variable with finite variance. Then we have the nice property:

$$\mathbb{E}[X|\mathcal{A}]=\arg \min_Y \mathbb{E}((X-Y)^2).$$

Here we take the minimum over $Y$ which are $\mathcal{A}$-measurable and have finite variance. In the general situation, $X$ only has finite mean, and in this case this approach does not work.

I like this property because it makes conditional expectation relatively concrete, even in a relatively general situation: the conditional expectation of $X$ is just the orthogonal projection onto an appropriate subspace of the Hilbert space of random variables with finite variance. So I would like to try to extend a weaker version of it to the most general situation.

My idea is as follows. Suppose $X$ has only finite mean, and take a sequence of random variables $X_n$ which converge in mean to $X$, such that each $X_n$ has finite variance. Of course they do not converge in mean square, but I do not require this here.

Now consider the sequence $\mathbb{E}[X_n | \mathcal{A}]$. Does this sequence converge to $\mathbb{E}[X|\mathcal{A}]$? If so, in which senses does it converge (a.s., in mean, etc.)? If not, can I add a hypothesis to get this result (for instance, assume in addition that $X_n \to X$ a.s.)?


Proving convergence of conditional expectations has much in common with proving convergence of expectations: the standard tools are the dominated convergence and monotone convergence theorem as well as Jensen's inequality.

  • (Dominated convergence) If $|X_n| \leq Y$ for some $Y \in L^1$ and $X_n \to X$ almost surely, then $$\lim_{n \to \infty} \mathbb{E}(X_n \mid \mathcal{F}) = \mathbb{E}(X \mid \mathcal{F}) \qquad \text{a.s.}$$ for any sub-$\sigma$-algebra $\mathcal{F}$.
  • (Monotone convergence) Suppose that $X_n \uparrow X$, i.e. $X = \sup_{n \in \mathbb{N}} X_n$ and $X_n \leq X_{n+1}$, and $X_n \geq 0$. Then $$\lim_{n \to \infty} \mathbb{E}(X_n \mid \mathcal{F}) = \mathbb{E}(X \mid \mathcal{F}) \qquad \text{a.s.}$$ for any sub-$\sigma$-algebra $\mathcal{F}$.
  • Suppose that $X_n \to X$ in $L^p$ for some $p \geq 1$. Then $Y_n := \mathbb{E}(X_n \mid \mathcal{F}) \to \mathbb{E}(X \mid \mathcal{F})=:Y$ in $L^p$.

    This statement is a direct consequence of the conditional Jensen inequality: $$\begin{align*} |Y_n-Y|^p = \left| \mathbb{E}(X_n-X \mid \mathcal{F}) \right|^p \leq \mathbb{E}(|X_n-X|^p \mid \mathcal{F}) \end{align*}$$ and therefore, by the tower property, $$\mathbb{E}(|Y_n-Y|^p) \leq \mathbb{E}(|X_n-X|^p) \to 0 \qquad \text{as} \, \, n \to \infty.$$


It is possible to actually define conditional expectation in this way. Explicitly, let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and $\mathcal{F}_0\subset\mathcal{F}$ a $\sigma$-field. Let $P_M$ be the orthogonal projection onto a closed convex set $M$. It is not hard to see $\mathcal{H}:=L^2(\mathbb{P}|_{\mathcal{F}_0})$ can be embedded as a closed subspace of $L^2(\mathbb{P})$. Hence for $X\in L^2(\mathbb{P})$ we may define $$\mathbb{E}[X|\mathcal{F}_0]:=P_{\mathcal{H}}(X)$$ Since the probability space is finite, $L^2(\mathbb{P})$ is a dense subspace of $L^1(\mathbb{P})$. Moreover, one may show $\|P_\mathcal{H}(X)\|_1\le\|X\|_1$ for all $X\in L^2(\mathbb{P})$. Hence the map $P_\mathcal{H}$ lifts uniquely to a map $X\mapsto\mathbb{E}[X|\mathcal{F}_0]$ satisfying $\|\mathbb{E}[X|\mathcal{F}_0]\|_1\le\|X\|_1$ for all $X\in L^1(\mathbb{P})$. It is not hard to show this definition coincides with the usual one (indeed, the usual definition uses the Radon-Nikodyn theorem which is proved using more or less does this same procedure in greater generality).

In essence, if you take any sequence $(X_n)$ in $L^2(\mathbb{P})$ converging to $X\in L^1(\mathbb{P})$, then $\mathbb{E}[X|\mathcal{F}_0]=\lim_{n\to\infty}\mathbb{E}[X_n|\mathcal{F}_0]$, where the limit is in $L^1$ and $\mathbb{E}[X_n|\mathcal{F}_0]$ is simply the orthogonal projection of $X_n$ onto $L^2(\mathbb{P}|_{\mathcal{F}_0})$.