Do men or women have more brothers?

Do men or women have more brothers?

I think women have more as no man can be his own brother. But how one can prove it rigorously?


I am going to suggest some reasonable background assumptions:

  1. There are a large number of individuals, of whom half are men and half are women.
  2. The individuals are partitioned into nonempty families.
  3. The distribution of the sizes of the families is deliberately not specified.
  4. However, in each family, the sex of each member is independent of the sexes of the other members.

I believe these assumptions are roughly correct for the world we actually live in.

Even in the absence of any information about point 3, what can one say about relative expectation of the random variables “Number of brothers of individual $I$, given that $I$ is female” and “Number of brothers of individual $I$, given that $I$ is male”?

And how can one directly refute the argument that claims that the second expectation should almost certainly be smaller than the first, based on the observation that in any single family, say with two girls and one boy, the girls have at least as many brothers as do the boys, and usually more.


So many long answers! But really it's quite simple.

  • Mathematically, the expected number of brothers is the same for men and women.
  • In real life, we can expect men to have slightly more brothers than women.

Mathematically:

Assume, as the question puts it, that "in each family, the sex of each member is independent of the sexes of the other members". This is all we assume: we don't get to pick a particular set of families. (This is essential: If we were to choose the collection of families we consider, we can find collections where the men have more brothers, collections where the women have more brothers, or where the numbers are equal: we can get the answer to come out any way at all.)

I'll write $p$ for the gender ratio, i.e. the proportion of all people who are men. In real life $p$ is close to 0.5, but this doesn't make any difference. In any random set of $n$ persons, the expected (average) number of men is $n\cdot p$.

  1. Take an arbitrary child $x$, and let $n$ be the number of children in $x$'s family.
  2. Let $S(x)$ be the set of $x$'s siblings. Note that there are no gender-related restrictions on $S(x)$: It's just the set of children other than $x$.
  3. Obviously, the expected number of $x$'s brothers is the expected number of men in $S(x)$.
  4. So what is the expected number of men in this set? Since $x$ has $n-1$ siblings, it's just $(n-1)\cdot p$, or approximately $(n-1)\div 2$, regardless of $x$'s gender. That's all there is to it.

Note that the gender of $x$ didn't figure in this calculation at all. If we were to choose an arbitrary boy or an arbitrary girl in step 1, the calculation would be exactly the same, since $S(x)$ is not dependent on $x$'s gender.

In real life:

In reality, the gender distribution of children does depend on the parents a little bit (for biological reasons that are beyond the scope of math.se). I.e., the distribution of genders in families is not completely random. Suppose some couples cannot have boys, some might be unable to have girls, etc. In such a case, being male is evidence that your parents can have a boy, which (very) slightly raises the odds that you can have a brother.

In other words: If the likelihood of having boys does depend on the family, men on average have more brothers, not fewer. (I am expressly putting aside the "family planning" scenario where people choose to have more children depending on the gender of the ones they have. If you allow this, anything could happen.)


Edit, 5/24/16: After some thought I don't particularly like this answer anymore; please take a look at my second answer below instead.


Here's a simple version of the question. Suppose there is exactly one family which has $n$ children, of which $k$ are male with some probability $p_k$. When this happens, the men each have $k-1$ brothers, while the women have $k$ brothers. So it would seem that no matter what the probabilities $p_k$ are, the women will always have more brothers on average.

However, this is not true, and the reason is that sometimes we might have $k = 0$ (no males) or $k = n$ (no females). In the first case the women have no brothers and the men don't exist, and in the second case the men have $n-1$ brothers and the women don't exist. In these cases it's unclear whether the question even makes sense.


Another simple version of the question, which avoids the previous problem and which I think is more realistic, is to suppose that there are two families with a total of $2n$ children between them, $n$ of which are male and $n$ of which are female, but now the children are split between the families in some random way. If there are $m$ male children in the first family and $f$ female children, then the average number of brothers a man has is

$$\frac{m(m-1) + (n-m)(n-m-1)}{n}$$

while the average number of brothers a woman has is

$$\frac{mf + (n-m)(n-f)}{n}.$$

The first quantity is big when $m$ is either big or small (in other words, when the distribution of male children is lopsided between the two families) while the second quantity is big when $m$ and $f$ are either both big or both small (in other words, when the distribution of male and female children are similar in the two families). If we suppose that "big" and "small" are disjoint and both occur with some probability $p \le \frac{1}{2}$ (say $p = \frac{1}{3}$ to be concrete), then the first case occurs with probability $2p$ (say $2 \frac{1}{3} = \frac{2}{3}$) while the second case occurs with probability $2p^2$ (say $2 \frac{1}{9} = \frac{2}{9}$). So heuristically, in this version of the question:

If it's easy for there to be many or few men in a family, men could have more brothers than women because it's easier for men to correlate with themselves than for women to correlate with men.

But you don't have to take my word for it: we can actually do the computation. Let me write $M$ for the random variable describing the number of men in the first family and $F$ for the random variable describing the number of women in the first family, and let's assume that they are 1) independent and 2) symmetric about $\frac{n}{2}$, so that in particular

$$\mathbb{E}(M) = \mathbb{E}(F) = \frac{n}{2}.$$

$M$ and $F$ are independent, so

$$\mathbb{E}(MF) = \mathbb{E}(M) \mathbb{E}(F) = \frac{n^2}{4}.$$

and similarly for $n-M$ and $n-F$. This is already enough to compute the expected number of brothers a woman has, which is (because $MF$ and $(n-M)(n-F)$ have the same distribution by assumption)

$$\frac{2}{n} \left( \mathbb{E}(MF) \right) = \frac{n}{2}.$$

In other words, the expected number of brothers a woman has is precisely the expected number of men in one family. This also follows from linearity of expectation.

Next we'll compute the expected number of brothers a man has. This is (again because $M(M-1)$ and $(n-M)(n-M-1)$ have the same distribution by assumption)

$$\frac{2}{n} \left( \mathbb{E}(M(M-1)) \right) = \frac{2}{n} \left( \mathbb{E}(M^2) - \frac{n}{2} \right) = \frac{2}{n} \left( \text{Var}(M) + \frac{n^2}{4} - \frac{n}{2} \right) = \frac{n}{2} - 1 + \frac{2 \text{Var}(M)}{n}$$

where we used $\text{Var}(M) = \mathbb{E}(M^2) - \mathbb{E}(M)^2$. As in Donkey_2009's answer, this computation reveals that the answer depends delicately on the variance of the number of men in one family (although be careful comparing these two answers: in Donkey_2009's answer he's choosing a random family to inspect while I'm choosing a random distribution of males and females among two families). More precisely,

Men have more brothers than women on average if and only if $\text{Var}(M)$ is strictly larger than $\frac{n}{2}$.

For example, if the men are distributed by independent coin flips, then we can compute that $\text{Var}(M) = \frac{n}{4}$, so in fact in this case women have more brothers than men (and this doesn't depend on the distribution of $F$ at all, as long as it's independent of $M$). Here the heuristic argument about bigness and smallness doesn't apply because the probability of $M$ deviating from its mean is quite small.

But if, for example, $m$ is instead chosen uniformly at random among the possible values $0, 1, 2, \dots n$, then $\mathbb{E}(M^2) = \frac{n(2n+1)}{6}$, so $\text{Var}(M) = \frac{n(2n+1)}{6} - \frac{n^2}{4} = \frac{n^2}{12} + \frac{n}{6}$, which is quite a bit larger than in the previous case, and this gives about $\frac{2n}{3}$ expected brothers for men.

One quibble you might have with the above model is that you might not think it's reasonable for $M$ and $F$ to be independent. On the one hand, some families just like having lots of children, so you might expect $M$ and $F$ to be correlated. On the other hand, some families don't like having lots of children, so you might expect $M$ and $F$ to be anticorrelated. Without the independence assumption the computation for women acquires an extra term, namely $\frac{2 \text{Cov}(M, F)}{n}$ (as in Donkey_2009's answer), and now the answer also depends on how large this is relative to $\text{Var}(M)$.

Note that the argument in the OP that "no man can be his own brother" (basically, the $-1$ in $m(m-1)$) ought to imply, if it worked, that the difference between expected number of brothers for men and women is exactly $1$: this happens iff we are allowed to write $\mathbb{E}(M(M-1)) = \mathbb{E}(M) \mathbb{E}(M-1)$ iff $M$ is independent of itself iff it's constant iff $\text{Var}(M) = 0$.


Edit: Perhaps the biggest objection you might have to the model above is that a given person's gender is not independent of the gender of their siblings; that is, as Greg Martin points out in the comments below, requirement 4 in the OP is not satisfied. This is easiest to see in the extreme case that $n = 1$: in that case we're only distributing one male and one female child, and so any siblings you have must have opposite gender from you. In general the fact that the number of male and female children is fixed here means that your siblings are slightly more likely to be a different gender from you.

A more realistic model would be to both distribute the children randomly and to assign their genders randomly. Beyond that we should think more about how to model family sizes.


I think I will argue that Cut the Knot is correct.

The distribution of sizes of families is not specified. Let's do some examples. Suppose all families have size 1. Then every boy has no brothers and every girl has no brothers. (So we certainly cannot conclude girls have more brothers than boys independently of the distribution of family sizes.)

Next example. All families have size 2. But random genders for the kids. Then there are four types of families, all equally likely: $$ B\qquad B\\ B\qquad G\\ G\qquad B\\ G\qquad G\ $$ I wrote B=boy, G=girl, in order of birth. Now, if we choose a boy at random, how many brothers does he have? There are four Bs in the list, two of them have 1 brother and two of them have no brothers. So a boy chosen at random has no brother with probability $1/2$ and has one brother with probability $1/2$. (Note: we chose a boy at random, not a family at random.) Now repeat, choosing a G at random. We get: A girl chosen at random has no brother with probability $1/2$ and has one brother with probability $1/2$. Again, it is false that a random girl has more brothers than a random boy.

If you like, do it again for families of size 3. A boy chosen at random has: no brother with probability $1/4$, one brother with probability $1/2$, and two brothers with probability $1/4$. Same for a girl chosen at random.

This works for any size families, as long as the sizes are fixed in advance, and the genders are random and independent.


I've been accused of overcomplicating the issue, so here's a shorter and different answer. This mostly repeats things that have been said already, e.g. in zyx's answer. Consider any model where

  1. Children are male with probability $\frac{1}{2}$ and female with probability $\frac{1}{2}$,
  2. A given child's gender is independent of the gender of their siblings, and
  3. A given child's gender is also independent of the size of the family they're in.

With these assumptions, the expected number of brothers of any child is $\frac{F-1}{2}$ where $F$ is the expected size of a family (where we pick a random family by picking a random child). By linearity of expectation, the expected number of brothers of any man, as well as any woman, is also $\frac{F-1}{2}$. A simple example of a model satisfying all of these assumptions is a model where children are both distributed to a family uniformly and independently and also assigned a gender independently. Another example is a model where the size of each family is fixed, and genders are chosen independently.

The model in my previous answer (dividing a fixed pool of children with fixed genders between two families) does not satisfy assumption 2.

Interestingly enough, it can happen that there are no families with only male or only female children, meaning that in every family the women have more brothers than the men, and nevertheless it's still true that the expected number of brothers is the same for women and men. The reason is that when we compute the expectation for men, families with more male children are weighted more heavily. As Julian Rosen says in a comment on the OP, this is an example of Simpson's paradox.