Interpretation of a probability problem: expected value.

I am having a few doubt on the interpretation of this problem that I have read on book about interviews questions.

Here the text:

A mythical city contains N=100,000 married couples but no children. Each family wishes to continue the male lane but they do not wish to overpopulate. So, each family has one baby per annum until the arrival of the first boy. Assume that all the children are equally like to be born male and female (and independent). Let $p(n)$ be the percentage of children that are male at the end of the year n. How is this percentage expected to evolve through time?

This is the problem and the solution says that the percentage is expected to remain constant at a level $\frac{1}{2}$.

Thanks in advance; if something is not clear, just ask.


Solution 1:

Since the probability of having a child of either particular gender is 1/2 and is independent, the percentage must be expected to remain constant at 1/2 boys and 1/2 girls. (Anytime anyone has a child we expect it to be a boy with probability 1/2, when they stop having children does not matter.)

Maybe it is illuminating to consider the first couple of years: At the end of the first year, we expect 1/2 of the couples to have had boys and 1/2 to have had girls, so we have 1/2 of the children are boys, and 1/2 are girls (or 50,000 boys and 50,000 girls using the N = 100,000 couples of the problem).

At the end of the second year, all of the couples with girls will have another child, 1/2 of them expected to be boys, and 1/2 expected to be girls. Hence, the overall population of children will still be 1/2 boys and 1/2 girls. (So, in particular with N = 100,000, we expect 50,000 of these to have already had a boy, and 50,000 to have another child. Of these 50,000 we expect 25,000 to have boys and 25,000 to have girls. Hence, the population of children at the end of the second year is 75,000 boys and 75,000 girls.)

This will continue until every couple has a boy... The child population from the previous year will be 1/2 boys and 1/2 girls, and the new children born in a particular year will be 1/2 boys and 1/2 girls, leaving the ratio of boys to girls unchanged.

Solution 2:

During a given year, each family either (1) has a unique child or (2) has no child. Those who previously had a boy decide to have no supplementary child, hence these are all in case (2), but maybe some others are in case (2) as well, for other reasons of their own, this does not matter.

What matters is that each family in case (1) has as much chances to have a boy than a girl. By the law of large numbers, if the number $M$ of families who do procreate during this given year is large and if each procreates independently on the others, $\frac12M+r_M$ boys are born and $\frac12M-r_M$ girls are born, where $r_M$ is random and $|r_M|\ll M$. The proportion of boys amongst the children born this year is $\frac{M/2+r_M}M=\frac12+\varepsilon_M$ with $\varepsilon_M=\frac{r_M}M$ hence $|\varepsilon_M|\ll1$. In other words, roughly one half of all the children born this year are boys.

Thus the hypothesis that the global population is large is important, but the details of the strategy (in the present case, Stop after one boy) are simply not relevant since every adapted strategy (in the sense that the decision on a given year only depends on what happened on the previous years) would yield the same result.


Edit (This is to answer a comment by the OP.)

The preceding paragraphs describe the almost sure behaviour in the limit of large initial populations. Turning to the behaviour in the mean for finite initial populations, note that the distribution of $r_M$ is symmetric, since $r_M$ is the sum of a random number $M$ of i.i.d. centered $\pm1/2$ Bernoulli random variables. Hence $\mathrm E\left(\frac{b_k}{g_k+b_k}\right)=\frac12$ exactly, where $b_k$ and $g_k$ denote the numbers of boys and girls born in generation $k$.

This does not imply that the total numbers $B_k=b_1+\cdots+b_k$ and $G_k=g_1+\cdots+g_k$ of boys and girls born until generation $k$ fulfill the same property.

Consider for example the second generation. Then the distribution of $b_1$ is binomial $(N,\frac12)$, $g_1=N-b_1$, the conditional distribution of $b_2$ conditionally on $b_1$ or $g_1$ is binomial $(g_1,\frac12)$, and $g_2=g_1-b_2$. In particular, $\mathrm E(b_1)=\frac12N$ and $\mathrm E(b_2\mid b_1)=\frac12(N-b_1)$.

Consider the successive ratios $R_k=\frac{B_k}{B_k+G_k}$. Then $R_1=\frac{b_1}N$ hence $\mathrm E(R_1)=\frac12$. On the other hand, $R_2=\frac{b_1+b_2}{N+g_1}=\frac{b_1+b_2}{2N-b_1}$ hence $\mathrm E(R_2\mid b_1)=\frac{b_1+(g_1/2)}{2N-b_1}=\frac12\frac{N+b_1}{2N-b_1}$. By convexity, $$ \mathrm E(R_2)\gt\frac12\frac{N+\mathrm E(b_1)}{2N-\mathrm E(b_1)}=\frac12\frac{N+(N/2)}{2N-(N/2)}=\frac12, $$ hence $\mathrm E(R_2)\ne\frac12$.


Second edit

Counting the children family by family instead of generation by generation, one sees readily that $R_k\to R_\infty$ almost surely when $k\to\infty$, where $R_\infty=\frac{N}{N+\sigma_N}$ and $\sigma_N$ the sum of $N$ i.i.d. geometric random variables $\tau_i$ of parameter $\frac12$, such that $\mathrm P(\tau_i=n)=2^{-n}$ for every $n\geqslant0$. Further computations then show that $$ \mathrm E(R_\infty)=N\int_0^1\frac{u^{N-1}}{(2-u)^N}\mathrm du=\frac12+\frac34\frac1N+o\left(\frac1N\right). $$ In particular, $\mathrm E(R_k)\ne\frac12$ for every $k$ large enough (and probably for every $k\geqslant2$).