Do we really get extra freedom if one conditions on probability zero events?
Just to make things clear, I'm not here claiming I broke probability theory. It is just that I got myself into a bad situation questioning if life is even worth it.
So here is a problem:
Problem: Let $(B_t)_{t\geq 0}$ be a standard BM and let us condition on $\{B_1=0\}$. Let $A\in \mathcal F_1$ (where $\mathcal F_t$ is the canonical filtration of $(B_t)_{t\geq 0}$). For example we can have something like $A=\{B_t\leq 1 $ for all $t\in [0,1]\}$. Find $\mathbb P(A\mid B_1=0)$.
I saw such problem in the book "A first course in stochastic processes" by Karlin and Taylor (exercise 6, p 386).
My solutions to the problem. I can give the simple answer "It is zero" i.e. $$\mathbb P(A\mid B_1=0)=0$$(actually say any number between 0 and 1). On the other hand, of course, I can do some calculations and provide an answer that is better accepted.
So now my question is:
My question: on what basis can one actually tell me my first answer, where I claim it is zero, is wrong?
My own thoughts:
- We want to find a "nice" function $g$ for which $g(B_1)=\mathbb P(A|B_1)$ a.s., and then the answer is $g(0)$. But then we get the problem $\mathbb P(A|B_1)$ is not unique on null sets so we can find another $h$ for which $g(0)\neq h(0)$ and still $h(B_1)=\mathbb P(A\mid B_1)$ a.s..
- That apparently is not strong enough to give us a unique answer for our original problem. Let's go for something stronger and say that we want a regular conditional probability $g(x,A)$ for which $g(B_1,A)=\mathbb P(A|B_1)$ a.s.. But in this case too, nothing stops me from making a new function $h(x,A)$ making it equal to $g(x,A)$ except at $x=0$, I make it whatever I want. And yes that new $h$ is also a regular conditional probability.
- Is limits the only way to make this give us a unique answer? I mean that we condition on something like $U_{\varepsilon}:=\{B_1\in (-\varepsilon,\varepsilon)\}$. And then we consider the limit as $\varepsilon\to 0^+$ of $\mathbb P(A\mid U_\varepsilon)$. And that we take as a definition. I hate to say this, but if this is the case, does this always work for any type of process?
- Something like Doob's $h$-transform maybe? I still have the feeling that this won't make it unique either.
I actually feel super flawed. I've seen this many times and never made a big deal out of it, but after I was solving a related problem I got this question where I was wondering who told me that any other answer is actually wrong? I could not prove it. Also I know that probabilist's work was not for nothing, so I'm sure there is a way to make $\mathbb P(A\mid B_1=0)$ so precise that we get only one correct right answer for the mentioned problem.
I've explained some of the difficulties in conditioning on events of probability zero in my answer here. In essence, to pin down a notion of conditional probability on an event with probability zero, some additional input is required, such as a symmetry principle, or a partition of hypotheses with respect to which it is supposed to make sense. In your example, one might for example require that the law of total probability should hold with the choices you make for $\mathbb{P}(A|B_1=x)$ with $x \in \mathbb{R}$, which leads to the concept of disintegrations. However that pins down $ x \mapsto \mathbb{P}(A|B_1=x)$ only for almost every $x$, and you might still define it in any way you like at $x = 0$.
To rule out pathological choices, it is usually required that $ x \mapsto \mathbb{P}(A|B_1=x)$ be measurable for any $\mathcal{F}$-measurable event $A$ and that $A \mapsto \mathbb{P}(A|B_1=x)$ is a probability measure for any $x \in \mathbb{R}$, which leads precisely to the definition of a regular conditional distribution, which is defined only almost surely. When asked to evaluate a regular conditional distribution pointwise, common wisdom is to choose the value of a continuous representative (if it exists) at that point.
This is similar to the following question: what is the value of the density of a standard normal variable at $x = 0$? Since densities are only unique up to almost everywhere equivalence, this is actually an ill-posed question, and any value in $\mathbb{R}$ is a technically 'correct' answer. However, I believe that most people would interpret this question as 'what is the value of the continuous representative of the density of a standard normal variable at $x=0$?', which is a well-posed question with the unique answer of $1/\sqrt{2\pi}$. In your question, the same approach yields a unique answer, so perhaps the authors really meant to ask 'what is the value of the continuous representative of $x \mapsto \mathbb{P}(A|B_1=x)$ at $x=0$?'.
Regarding defining conditional distributions through limits as you propose: this is a natural idea, but note that this leads to an irregular conditional distribution. To see this, let's define $\mu(A) := \lim_{\epsilon \to 0} \mathbb{P}(A|U_{\epsilon})$. Then, for any $\delta > 0$, we have $\mu(U_{\delta}) = 1$, but $\mu(\{B_1 = 0\}) = 0$. Since the sets $U_\delta$ are decreasing in $\delta$ and $\bigcap_{\delta > 0} U_{\delta} = \{B_1 =0\}$, this contradicts continuity from above, hence $\mu$ is not a probability measure.