Conditional expectation with respect to a $\sigma$-algebra
Could someone explain what it is that we are intuitively trying to achieve with the definition? Having read the definition I could do the problems in the section of my book, but I still have no intuitive idea of what the definition is trying to achieve. When given just a single event, I understand that the definition should be the integral in terms of the restricted measure on the event set,
$$\mu_E(A) := \mu(A\cap E)/\mu(E).$$
It's also intuitively clear what information a single event carries, i.e. "the outcome was one of these in the set event set".
Can someone explain to me the following:
What do we even mean by the information carried by a $\sigma$-algebra? In other words, I can't even understand what we would like this to represent.
Why do we want the conditional expectation to be a random variable? I assume this might follow naturally from (1) if I understood what we're trying to accomplish.
I can try to explain what I understand, what conditional expectation tries to accomplish (let's say we work on $(\Omega, \mathscr A, \mathbf P)$.
(1) The "information" carried by a $\sigma$-algebra $\mathscr F \subseteq \mathscr A$, is (like the information carried by an event) the possibility to say for a random outcome $\omega \in \Omega$ to which $A \in \mathscr F$ our $\omega \in \Omega$ belongs. The restricted measure $\def\P{\mathbf P}\P_E := \P(-\cap E)/\P(E)$ measures only the outcomes in $E$ and in 0 on $\Omega \setminus E$. So we have the "information" that $\omega \in E$ (and are not interested in the rest).
For (2), let's first look at the simple example $\mathscr F = \sigma(\{E\}) = \{\emptyset, E, \Omega\setminus E, \Omega\}$. This is the $\sigma$-algebra which corresponds to a single event, if we have this "information", then for our outcome we know whether it belongs to $E$ or to $\Omega \setminus E$. So the conditional expectation should behave different on $E$ and on $\Omega \setminus E$, namely being the expectation with respect to $\P_E$ or $\P_{\Omega \setminus E}$. Hence we have for a random variable $X$: $$ \def\E{\mathbf E}\E(X\mid \mathscr F) = \begin{cases} \E_{P_E}(X) & \omega \in E\\ \E_{\P_{\Omega\setminus E}}(X) & \omega \in \Omega \setminus E \end{cases} $$ Recall that $$ \E_{\P_E}(X) = \frac 1{\P(E)}\int_E X\, d\P $$ As the next step, let's think of some more "information", in the above we have partitioned $\Omega$ into two sets, let's now look at a partition $\Omega = \biguplus_{i=1}^\infty E_i$, and $\mathscr F = \sigma\{E_i: i \ge 1\}$. If we have the information carried by $\mathscr F$, given $\omega \in \Omega$ what can we estimate $X(\omega)$. As we "know", to which $E_i$ $\omega$ belongs (that's the information carried by $\mathscr F$), the best we can do is $$ \E(X \mid \mathscr F)(\omega) = \frac 1{\P(E_i)}\int_{E_i} X \, d\P $$ (note that this is a $\mathscr F$-measurable $\sigma$-algebra. I like to think that in general a $\mathscr F$-measurable function somehow may only "use" information carried by $\mathscr F$, so we want $E(X\mid \mathscr F)$ to be a $\mathscr F$-measurable function in general (and as we have to give different answers on different elements of $\mathscr F$ [even different families of elements of $\mathscr F$ for each element $\omega$ in general]), the conditional expectation will be a random variable.
- One can think of the "information carried by a $\sigma$-algebra" as a measure of "roughness" of the events we are considering.
For example, think of the probability space modelling the outcome of rolling a dice: $$X = \{1,2,3,4,5,6\}.$$ The standard $\sigma$-algebra we could use here is just the power set of $X$; that is, we can consider any subset of $X$ as an event. However, if we are interested only in the parity of the outcome, then a smaller $\sigma$-algebra will suffice: $$\mathcal{F} = \{\emptyset, \{1,3,5\}, \{2,4,6\}, X\}.$$ If we restrict ourselves to this $\sigma$-algebra, we can still measure the probability of events which we can express in terms of parity, e.g. "the outcome was an even number". On the other hand, if we wanted to consider a question such as "was the outcome a prime number?", we would have to refine our $\sigma$-algebra in such a way that would make the subset $\{2,3,5\}$ measurable.
- As you said, the intuition behind the concept of conditional expectation can be drawn from (1). If $X$ is a random variable, we may (informally) think of $\mathbb{E}[X|\mathcal{F}]$ as the random variable which describes (imitates) $X$ as close as possible, while having only events from $\mathcal{F}$ in its "output".
Note: The "closeness" mentioned in (2) can actually be formalized in the following way: if we restrict our attention to the (Hilbert) space $L^2$ of square-integrable variables, then $\mathbb{E}[X|\mathcal{F}]$ is the orthogonal projection of $X$ onto the subspace consisting of $\mathcal{F}$-measurable functions.