A Measure Theoretic formulation of Bayes' Theorem
I am trying to find a measure theoretic formulation of Bayes' theorem, when used in statistical inference, Bayes' theorem is usually defined as:
$$p\left(\theta|x\right) = \frac{p\left(x|\theta\right) \cdot p\left(\theta\right)}{p\left(x\right)}$$
where:
-
$p\left(\theta|x\right)$: the posterior density of the parameter.
-
$p\left(x|\theta\right)$: the statistical model (or likelihood).
-
$p\left(\theta\right)$: the prior density of the parameter.
- $p\left(x\right)$: the evidence.
Now how would we define Bayes' theorem in a measure theoretic way?
So, I started by defining a probability space:
$$\left(\Theta, \mathcal{F}_\Theta, \mathbb{P}_\Theta\right)$$
such that $\theta \in \Theta$.
I then defined another probability space:
$$\left(X, \mathcal{F}_X, \mathbb{P}_X\right)$$
such that $x \in X$.
From here now on I don't know what to do, the joint probability space would be:
$$\left(\Theta \times X, \mathcal{F}_\Theta \otimes \mathcal{F}_X, ?\right)$$
but I don't know what the measure should be.
Bayes' theorem should be written as follow:
$$? = \frac{? \cdot \mathbb{P}_\Theta}{\mathbb{P}_X}$$
where:
$$\mathbb{P}_X = \int_{\theta \in \Theta} ? \space \mathrm{d}\mathbb{P}_\Theta$$
but as you can see I don't know the other measures and in which probability space they reside.
I stumbled upon this thread but it was of little help and I don't know how was the following measure-theoretic generalization of Bayes' rule reached:
$${P_{\Theta |y}}(A) = \int\limits_{x \in A} {\frac{{\mathrm d{P_{\Omega |x}}}}{{\mathrm d{P_\Omega }}}(y)\mathrm d{P_\Theta }}$$
I'm self-learning measure theoretic probability and lack guidance so excuse my ignorance.
Solution 1:
I'm not 100% convinced by the expression in the linked thread. The notion of conditional probability itself is not itself so easy to express in a measure-theoretic way. I will try to restrict my answer to the more basic formulations.
The first thing to note is that everything is defined on the same probability space $(\Omega, \mathcal{F}, \mathbf{P})$. The quantities $\theta$ and $X$ are then random variables taking values in some spaces $\Theta, \mathcal{X}$ respectively. In particular, the random variable $(\theta, X)$ has a joint distribution, which is the object of Bayesian statistical inquiry.
Now, the statistical set-up is that $\theta$ has a marginal distribution $\pi$ (which is called the prior), and the statistical model is a family of conditional distributions, which specify the distribution of $X$ based on the parameter values. In particular, we have what's called a transition kernel $\nu \colon \Theta \times \mathcal{F}_{\mathcal{X}} \rightarrow [0, 1]$, which encodes the conditional likelihood through $$ \mathbf{P}(X \in A|\theta) = \nu(\theta, A). $$
For measurability reasons, we demand that each $\nu(\theta, \cdot)$ is a probability measure and that each $\nu(\cdot, A)$ is a measurable function of $\theta$.
To make sense of Bayes rule, we also demand that each $\nu(\theta, \cdot)$ is absolutely continuous with respect to a common $\sigma$-finite carrying measure $\mu$ (this is not much of a condition, since in 99.9% of cases this is the Lebesgue measure on $\mathbf{R}^n$, the counting measure on $\mathbf{Z}$ or some combination of these). This condition allows us to define the conditional likelihood of our model: $$ f(x|\theta) := \frac{\mathrm{d} \nu(\theta, \cdot)}{\mathrm{d} \mu}(x), $$ which is measurable as a function of $x$ from the Radon-Nikodym theorem and as a function of $\theta$ from our regularity condition on $\nu$.
With all this set-up, the measure-theoretic formulation of Bayes theorem is that for each $x \in \mathcal{X}$, the conditional distribution of $\theta|X = x$ is defined by the probability measure $$ \pi(\mathrm{d} \theta | x) \propto f(x | \theta) \pi(\mathrm{d} \theta). $$
More fully (and including the constant of proportionality), this is the statement: $$ \mathbf{P}(\theta \in \Gamma | X = x) = \frac{\int_{\Gamma} f(x | \theta) \pi(\mathrm{d} \theta)}{\int_{\Theta} f(x | \theta) \pi(\mathrm{d} \theta)}. $$