Given a coin's bias, how can two flips be conditionally independent?

Solution 1:

You are correct that knowing it is biased doesn’t make it unbiased — but it does make Bob’s outcome irrelevant to Alice.

The example is confusing because it doesn’t clearly separate epistemic and aleatory probability— the coin flips are biased and independent regardless of our state of knowledge.

What is implicit in the example is that the degree of bias has a prior distribution that gets updated with new information (Bob’s flip). However, this is not affecting the coin itself but our assessment of the true degree of bias. Once you know the degree of bias then all epistemic uncertainty is gone and you just have the usual aleatory probability of the coin flipping process which results in independent outcomes (note: not equal outcomes!)

Here’s an example to make this less philosophical:

Let’s assume $p:=P(H) = 0.999$ but Alice and Bob don’t know that. Instead they have a prior belief that $p$ is equally likely to be anywhere in the range $[0,1]$.

If Bob gets H then that will concentrate the posterior probably more in the $(.5,1]$ range, affecting our estimate of Alice’s probability of heads.

Solution 2:

Its much simpler to think of this through Causality.

In the example that you linked to, Age affects both Foot Size and Literary Score. In this coin example, the Bias of the coin affects Toss $A$ and Toss $B$.

enter image description here

The rule here is that after you've drawn out the graph, two events are conditionally independent if you can't traverse from one node to the other without going through a "blocked" node, where a "blocked" node is an event that has has already happened.

Understanding Bayesian Networks will help: http://www.cse.unsw.edu.au/~cs9417ml/Bayes/Pages/Bayesian_Networks_Definition.html

Solution 3:

Bey's point about epistemic and aleatory probability is excellent, and highlights why the explanation I gave is unclear, particularly if you aren't used to treating probabilities this way.

Though, first, I am not sure if you might still be confusing the concepts of bias and independence. Being conditionally independent does not remove the bias from the coin. Intuitively it simply says that the two events do not directly affect the likelihood of each other, but are linked by some other factor. If this bit is unclear then I'd encourage you to read some other explanations.

But perhaps I misunderstood your question, and you were asking "how are they only conditionally independent – i.e. aren't they just independent"? If so, good question. I think a frequentist perspective would be even more likely to think this way, whereas I was certainly answering from a Bayesian one before, and I hadn't fully considered the importance of the unknown bias in the coin. I'll try my best to elaborate.

(Note, I welcome any edits from the more adept statisticians here – I am merely a computer scientist surprised not to see any notation when I originally answered that other question 5 odd years ago.)

Given a coin with probability of heads $p(H) = \theta$, which Alice and Bob both tossed some large number of times, then clearly $p(A = H) = p(B = H) = \theta$. No matter whether the coin is biased or not, they are not "just" conditionally independent, they are independent. Because

$$p(A=H|B=H) = p(A=H)$$

(They are both equal to $\theta$.)

The frequentist interpretation might struggle somewhat to denote the event $Z$ as "the coin is biased" here because we know it is biased and we will not observe any outcomes where the coin is not biased. But still, if you simply count the occurances, you will also see that

$$p(A=H|B=H, Z) = p(A=H|Z)$$

because these probabilities are also equal to $\theta$. So, the events $A$ and $B$ are also conditionally independent given $Z$ (as well as simply being non-conditionally independent).

We can check this empirically. I ran a simulation with $n=10,000$ coin flips and $\theta=0.8$. Alice flipped 7939 heads, and Bob flipped 7979. Of Bob's 7979 heads, Alice got 6302 heads on the same flip. So

$$p(A=H) = 7939/10000 = 0.794$$ $$p(A=H|B=H) = 6302/7979 = 0.790$$

Comfortably close enough as we expect.

However, suppose we do not know if the coin is biased or not. Suppose on each trial, Bob chooses a coin randomly between two coins, one biased to get heads 80% of the time, the other biased to get heads 20% of the time. Call $Z$ the event where Bob chooses the coin biased towards heads, and we do not know $p(Z)$.

Now you hopefully agree that Alice and Bob's flips are not independent. We do not know which coin Bob picked, but if he flipped heads, we should expect Alice to be more likely to flip heads than if we simply don't know either way, i.e.

$$p(A=H | B=H) > p(A=H)$$

(Crucially they are not equal, so not independent.)

But if we condition on the fact that we know $Z$ is true, i.e. the coin is biased towards heads, then we get the exact same situation we got before where we fixed the probability at $0.8$.

Again I ran a simulation with 10,000 flips (I didn't write down the exact counts but just the probabilities):

$$p(A=H) = 0.4954$$ $$p(A=H | B=H) = 0.6807$$ $$p(A=H|Z) = 0.7978$$ $$p(A=H|B=H, Z) = 0.8016$$

(Of course this required me to choose a value for $p(Z)$ – I chose 0.5 – and so you could also calculate these analytically.)

Since $$p(A=H|B=H, Z) = p(A=H|Z)$$ we say that A and B are conditionally independent given Z.

For me the definitions come first, and I try to derive the intuition from that. This is why I answered that other post in the first place, because the definitions were absent. Still, everyone learns differently, so if this didn't help then I hope one of the other answers has!