Intuition behind conditional expectation when sigma algebra isn't generated by a partition

I'm struggling with the concept of conditional expectation, when the sigma algebra on which it is conditioned isn't generated by a partition.

If $(\Omega,\mathcal{F},P)$ is a probability field such that $\mathcal{F}$ is generated by a partition $\Lambda_n$.

Then we know that:

$E[X|\mathcal{F}]$ = $E_1$$[X|\mathcal{F}]$ $I(\omega \in \Lambda_1)$ +$E_2$$[X|\mathcal{F}]$ $I(\omega \in \Lambda_2)$ + $E_3$$[X|\mathcal{F}]$ $I(\omega \in \Lambda_3)$ + .....

Where $E_i[.]$ is the expectation calculated as per the conditional probability $P(.|\Lambda_i)$

Hence when $\omega$ is in $\Lambda_i$ the conditional expectation gives the expectation of random variable X given that the observed event is $\Lambda_i$ and hence use the modified conditional probability rather than the original one. However this interpretation is only valid as long as the conditioning sigma algebra is generated by a partition. Is there a similar interpretation for a general case?

i.e what will it physically represent?

Any help will be greatly appreciated!

Thanks!

Best,
Adwait


Solution 1:

This got pretty long but it's the way I think about it.

A $\sigma$-algebra represents information. Formally it's a set of events, but you can think of it as a set of questions you know the answers to.

Conditional expectation is a way of making sense of the idea that if you know some information (represented by a $\sigma$-algebra) you get a new probability distribution conditioned on that information.

So if my $\sigma$-algebra represents a list of questions I want to associate a probability distribution with every possible set of answers. Doing this rigorously presents all sorts of problems and conditional expectation is a formal tool to get around them. To lay out the basic idea I'm going to look as the finite case on more detail.

Suppose you have a probability space $(\Omega, \mathcal F, \mathbb P)$. I can generate a finite $\sigma$-algebra $\mathcal G$ from a finite set of events $(E_1, \dots, E_n)$.

We can interpret this another way, suppose I've chosen an element $\omega\in\Omega$ and you're trying to guess what it is. To make it easier I'm going to let you ask $n$ questions. You have to choose all $n$ questions before you start asking them in this version of the game.

You've chosen the questions "is $\omega$ in $E_1$?" ... "is $\omega$ in $E_n$?".

When you've asked all your questions you have a conditional probability distribution on $\Omega$. In there are $2^n$ different sets of answers, so there are up to $2^n$ different probability distributions to consider. (some combinations of answers may occur with probability $0$).

So we could have function from the set of answer sequences to the set of probability distributions on $\omega$. But as some combinations of answers may not yield a well defined conditional probability it's better to think of this as a function from $\Omega$ to the set of probability distributions defined $\mathbb P$ almost everywhere. That is every $\omega$ give me a set of answers, so I associate with $\omega$ the probability distribution conditioned on those answers. As there are only finitely many answer sequences, with probability one I get a sequence of answers that occurs with positive probability. So I get a well defined conditional distribution with probability one.

Now let $\ell_1$ be the set of $\mathcal F$-measurable functions. We can always define a probability distribution in terms of its expectation operator. Which is a linear functional $\ell_1\to[-\infty,\infty]$.

So if I want to define a useful mathematical object that associates a probability distribution with (almost) every $\omega$ I can define a conditional expectation operator

$$\mathbb E(\circ |\mathcal G)(\circ): \ell_1\times\Omega\to[-\infty,\infty].$$

I can think of this in two ways, either as assigning an expectation operator to every $\omega\in\Omega$ or associating a random variable $\mathbb E(f|\mathcal G):\Omega\to[-\infty,\infty]$ with every $f\in\ell_1$.

As I've only defined my conditional distributions for almost every $\omega$ it's better to use the second idea and think of conditional expectation as a map $\ell_1\to\ell_1$, because random variables only need to be defined almost everywhere. So to get around the almost everywhere problem we say a well defined function $\ell_1\times\Omega\to[-\infty,\infty]$ is a conditional expectation if it gives the right random variables $\mathbb E(f|\mathcal G):\Omega\to[-\infty,\infty]$

So we have a choice of different versions of conditional expectations, we need a test to check if a given operator gives the right random variables.

In this case we need to find necessary and sufficient conditions for a conditional expectation to agree with the classical case almost everywhere.

Notice three things, firstly for every $f\in\ell_1$ the conditional expectation $\mathbb(f|\mathcal G):\Omega\to[-\infty,\infty]$ must be $\mathcal G$ measurable, because $\mathbb(\circ|\mathcal G)(\omega)$ is an expectation operator associated with a conditional probability distribution which depends on the answers to the $n$ questions.

Secondly you can check that for every function $f$ we must have $\mathbb E\left(\mathbb E(f|\mathcal G)\right) = \mathbb E(f)$.

Thirdly if $g$ is a $\mathcal G$-measurable function then $\mathbb E(fg|\mathcal G)(\omega) = g(\omega)\mathbb E(f|\mathcal G)(\omega)$ because $g$ is almost surely constant on each of the (finitely many) expectation operators $\mathbb E(\circ|\mathcal G)(\omega)$.

You should be able to convince yourself that for finite $\mathcal G$ the classical definition of conditional expectation is the only function that satisfies these three conditions.

As conditional expectation is only defined almost everywhere, the second condition doesn't make sense. But as we have completely free choice of $\mathcal G$-measurable $g$ we can combine the last two conditions to get

$$\mathbb E\left( \mathbb E(fg|\mathcal g)\right) = \mathbb E\left(\mathbb E(f|\mathcal G)g\right).$$

Again convince yourself that anything satisfying this must agree with the classical conditional expectation almost everywhere.

So for a finite $\sigma$-algebra the classical conditional expectation can be described as the only $\mathcal G$-measurable function that satisfies the condition above. For the finite case this is all a bit unnecessary, but it works. If I define a conditional expectation operator $\mathbb E(\circ |\mathcal G)(\circ): \ell_1\times\Omega\to[-\infty,\infty]$ this will give me an expectation operator and hence a probability distribution for almost every $\omega$. Furthermore that conditional probability distribution will agree with the classical one for almost every choice of $\omega$.

Suppose instead of $n$ questions I allowed you to ask a countably infinite list of questions. Now we want to do the same thing, but now the set of possible answers is uncountable. So it's quite possible that every set of answers will have probability $0$ and I can't use the normal definition of conditional expectation.

But what I want to achieve is the same thing. For every set of answers I want a conditional distribution on my probability space given those answers.

The ideas above still work with infinite sigma algebras, but you need to mess about with Radon-Nykodym derivatives to prove it and I assume you're familiar with that.

But it turns out there always exists a conditional expectation operator that satisfies the two conditions. So, although formally we describe conditional expectation as a random variable associated with each $\mathcal F$ measurable function, anything that satisfies the conditions gives me a probability distribution for almost every $\omega$. I can interpret that distribution as the conditional distribution given that I "know" $\mathcal G$.