Boy and girl paradox is driving me crazy
I know this question is asked over and over, but I still can't understand anything.
Say I'm introduced to a random father of two and I want to know what's the probability that both his children are boys. Currently:
- BB BG GB GG ⇢ 1/4
Where the first letter represents the younger sibling and the second letter represents the older sibling. So far so good.
(1) Now the father tells me that his youngest child is boy:
-
BB BG
GB GG⇢ 1/2
(2) If, instead, he told me that at least one of his children is a boy:
-
BB BG GB
GG⇢ 1/3
Makes sense, kind of.
(3) But if the father brought one of his children with him without telling whether he's the younger child or the older child and that child happened to be a boy, I think I could have still honestly arrived to the 50/50 probability:
-
BB BG
GB GG⇢ 1/2
Where the first letter represents the boy I've just seen and the second letter represents his sibling.
Now, say, the father first told me that he has at least 1 boy. That's the case (2).
Then the father called (one of) the boy(s) here, and somehow the situation turned into the case (3)!
What exactly has changed? What kind of new information did I just get? OK, I've seen (one of) the boy(s), but the only thing it tells me is that one of the children is a boy, which I already knew from the father's own words.
It seems to me that anything he could bring that has some kind of relationship to (one of) the boy(s) so as to allow me to uniquely identify him would work: a photo, a footprint on a beach, etc. Even if he simply told me that he has just thought about one of his children who is a boy, I think I could still have done this:
-
BB BG
GB GG⇢ 1/2
Where the first letter represents the boy the father has thought about at XX/XX/XXXX XX:XX:XX UTC, and the second letter represents his other child.
Is this magic? Or am I just stupid?
Can't I simply construct such a way of identification myself? For example, let the first letter represent the youngest boy (the only boy if there's just one), and let the other letter represent the other child. Since the father is not an abstract entity, this would uniquely identify some child.
I don't see how changing the representation changes things.
Say I saw one of the father's on a photo behind a thick blurry glass that doesn't let me see whether it's a girl or a boy. Therefore:
- BB BG GB GG ⇢ 1/4
Where the first letter represents the child on the photo and the second letter represents the other child.
Now the glass is removed and I can see the photo clearly and it's indeed a boy:
-
BB BG
GB GG⇢ 1/2
Then the father called (one of) the boy(s) here
Why did the father do that? It matters.
(A) If the father called over a child at random, and it happened to be one of the boys you had been talking about, then the probability that the remaining child is a boy is $1/2$.
(B) If you specifically asked the father to call over a boy, and he obliged, then you've learned nothing new, and the probability that the remaining child is a boy is still $1/3$.
Let's model both scenarios with the following probability space:
- BB1 BB2 BG1 BG2 GB1 GB2 GG1 GG2
The first letter is the sex of the older child, the second letter is the sex of the younger child, and the numeral is the favorite child, whom the father will call over if given the opportunity. Assume that all three variables are independent coin flips.
This is the event that the father has at least one boy:
- BB1 BB2 BG1 BG2 GB1 GB2
This is the event that the father calls over a boy in scenario (A):
- BB1 BB2 BG1 GB2
This is the event that the father calls over a boy in scenario (B):
- BB1 BB2 BG1 BG2 GB1 GB2
The apparent paradox lies in the information difference between "at least one" and "this one".
Suppose it's Halloween and the father is accompanied by two children in bulky costumes; so you cannot tell which is what. Let $L$ be the event of a boy being inside the costume on the left, and $R$ be the event of a boy being inside the costume on the right.
Prior to learning anything else, the probability that they are both boys is $\mathsf P(L\cap R)=1/4$, if we assume an independent 50:50 chance of boyhood for each.
If it comes up in conversation that at least one of them is a boy, then you only know $L\cup R$ and the conditional probability is: $$\mathsf P(L\cap R\mid L\cup R) = 1/3$$
If you then learn that the one on the left is a boy, then you know $L$ and the conditional probability is: $$\mathsf P(L\cap R\mid L) = 1/2$$
This is something you have to watch for as it is somewhat counter intuitive and can be subtle.
Our intuition is that if we know that "at least one child is a boy" , then the probability that "both children are boys" is the probability that "the other child is a boy". However, our intuition is wrong we don't know which child is the other child.
When told that "at least one a boy", that information could be given when just this child, just that child, or both children are boys.
When we are told that "that one a boy", that the information can only be given when either just this child, or both children, are boys. We know something extra (whether it's position, age, or whatever) about the identity of the boy.
It's not so much a paradox as an illustration that the way you pick the child matters. Let's look at it in the large:
If you take a list of all two-child families that have at least one boy, you'll find that one-third of them have two boys.
If you take a list of all boys with exactly one sibling, you'll find that half of them have a brother.
Why does this happen? In the second case, you're double-counting families with two boys.