How to explain connections between bounded Radon-Nikodym derivatives, convex combinations of probabilities, and conditional probability

I have been thinking a lot about conditional probabilities recently and have noticed what to my mind are some surprising connections. I will sketch a few results and I'd just like someone to provide some deeper insight into why they hold.

Let $(\Omega, \mathcal{F}, P)$ be a probability space. Let $P_E = P(\cdot \mid E)$ be the conditional probability given $E$. It's easy to verify that $P_E \ll P$ and that the Radon-Nikodym derivative $dP_E/dP$ is essentially bounded. Indeed, $\mathbf{1}_E/P(E)$ is a version of $dP_E/dP$ because for all $A \in \mathcal{F}$ $$\int_A \frac{\mathbf{1}_E}{P(E)}dP = \frac{P(A \cap E)}{P(E)}.$$

Somewhat surprisingly (to me), something like a converse holds as well. The following result is due to Diaconis and Zabell (1982, Theorem 2.1). Assume $(\Omega, \mathcal{F})$ is countable (the result holds in general, and this is just for ease of exposition).

Theorem. Suppose $Q$ is a probability measure on $(\Omega, \mathcal{F})$ such that for some $B \geq 1$ and all $\omega$ $$Q(\omega) \leq \beta P(\omega).$$ Then, there exists a probability space $(\Omega', \mathcal{F}', P')$ a sequence $\{ E_\omega : \omega \in \Omega \}$ of events in $\mathcal{F}'$ such that $P'(E_\omega) = P(\omega)$ and an event $E \in \mathcal{F}'$ such that $P'(E) > 0$ and $P'(E_\omega \mid E) = Q(\omega)$. In other words $(\Omega, \mathcal{F}, P)$ embeds into a richer space in which $P$ is a marginal probability and $Q$ is a conditional probability. We say that $Q$ can be obtained by conditioning $P$.

Proof Sketch. It follows from the supposition that $P = (1/\beta)Q + (1 - 1/\beta)R$ for some probability measure $R$ on $(\Omega, \mathcal{F})$. Let $\Omega' := \Omega \times \{a,b\}$, $E_\omega := (\omega, a) \cup (\omega, b)$, and $E = \cup_\omega (\omega, a)$. Let $P'(\omega, a) = (1/\beta)Q(\omega)$ and $P'(\omega, b) = (1 - 1/\beta)R(\omega)$, and it's easy to verify that the result follows.QED

Moreover, it's easy to check that $Q \leq \beta P$ is equivalent to $P = \alpha Q + (1 - \alpha)R$ for some probability $R$. We therefore have

Corollary. The following are equivalent. (1) $Q \leq \beta P$, (2) $P = \alpha Q + (1 - \alpha)R$ for some probability $R$, (3) $Q$ can be obtained by conditioning $P$.

The Corollary seems to relate three different properties that probabilities can have. (1) A measure-theoretic property: having a bounded density. (2) A vectorial property: being a convex combination of other probabilities. (3) A probabilistic property: being a probability that arises by conditioning.

Is there anything deep that can be said about the above results? Why are (1), (2), and (3) equivalent? Can (1), (2), and (3) be subsumed nicely into some more general theory? Is there anything more precise that can be said to relate the three "points of view" (measure-theoretic, vectorial, probabilistic) that (1), (2), and (3) exhibit?

I have recently begun reading about disintegrations and suspect that the material above is related to that topic somehow, but I cannot say anything precise at the moment.


Solution 1:

I think it would be helpful to start with notation. To avoid pathological cases, let's assume our measurable space $(\Omega,\mathcal F)$ is a Polish space with its Borel $\sigma$-field. Write $\mathcal M(\Omega)$ for the space of (signed finite Radon) measures and $\mathcal M_1(\Omega)\subset\mathcal M(\Omega)$ for the set of probability measures.

Firstly, I don't think I would necessarily agree with your statement in (2), namely that this is a result about the convexity of $\mathcal M_1(\Omega)$. Indeed, for any subset $\mathcal A\subset\mathcal M(\Omega)$, any $P,Q\in\mathcal A$ and any $\alpha\in(0,1)$ we may write $P=\alpha Q+(1-\alpha)R$ for some $R\in\mathcal M(\Omega)$. Convexity of $\mathcal A$ does not imply $R\in\mathcal A$ - we need something stronger, such as $P,Q\in\mathcal A\Rightarrow\ell(P,Q)\subset\mathcal A$, where $\ell(P,Q)$ is the unique line containing $P$ and $Q$. Of course, $\mathcal M_1(\Omega)$ fails this condition.

So what does this condition actually mean for $P,Q\in\mathcal M_1(\Omega)$? Well, obviously we can write $$R=\frac{P-\alpha Q}{1-\alpha}.$$ The $1-\alpha$ is just a positive scaling that we can ignore; it is simply the unique multiplicative constant such that $R(\Omega)=1$. So clearly we have that $R\in\mathcal M_1(\Omega)$ if and only if $P\ge\alpha Q$, so in this light the equivalence of (1) and (2) is completely trivial. I would thus argue that they are not really trying to say different things; (2) is a convenient representation for the theorem at hand.

I think the deepest result here is the one that you have used but not mentioned. Specifically, there exists a constant $\beta$ such that $Q(A)\le\beta P(A)$ for all $A\in\mathcal F$ if and only if there exists $f\in L^\infty(P)$ such that $$Q(A)=\int_Af\,dP.$$ This is remarkable! The latter condition clearly implies the former (just take $\beta=\|f\|_\infty$) but a priori we have no reason to expect the forward direction to be true. Of course, it follows by a routine application of the Radon-Nikodym theorem, but this in itself is an extremely powerful theorem.

As far as (1) and (3) goes, it shouldn't be surprising that a measure theoretic view of probability leads to results on conditional probability - especially when absolute continuity is involved - since the existence of abstract conditional expectation is a fairly straightforward consequence of the Radon-Nikodym theorem. Conditioning on a set of positive measure is the easy part; to see the full power of this theory, look further into disintegrations, or else the related topic of regular conditional probabilities. It is here that the assumption that $\Omega$ is a Polish space comes into play, as once our space is rich (and regular) enough, we are guaranteed to be able to do all of the things that our intuitive version of probability tells us we should be able to.