What is the distribution of the variable $X$ given $$ X = Y + Z, $$where $Y \sim $ Binomial($n$, $P_Y$) and $Z\sim$ Binomial($n$, $P_Z$)?


For the special case, when $P_Y = P_Z = P$, I think that X~Binomial($2n$, $P$) is correct. If $P_A ≠ P_B$, the distribution might eventually just be Binomial$\left(2n, \frac{P_A + P_B}{2}\right)$ but I can't prove it.

If the problem is more complicated than I expect and we can't derive the whole distribution, can we tell something about the mean and the variance of $X$?


Solution 1:

It will be a special case of the Poisson Binomial Distribution.

Solution 2:

See the binomial sum variance inequality. Here is an excerpt from the Wikipedia page.

In probability theory and statistics, the sum of independent binomial random variables is itself a binomial random variable if all the component variables share the same success probability. If success probabilities differ, the probability distribution of the sum is not binomial.

Solution 3:

Assuming $Y$ and $Z$ are independent, $X=Y+Z$ has mean $E[Y]+E[Z] = n P_Y + n P_Z$ and variance $\text{Var}(Y) + \text{Var}(Z) = n P_Y (1-P_Y) + n P_Z (1 - P_Z)$. The characteristic function is $$ \left( P_Y {{\rm e}^{it}}+1-P_Y \right) ^{n}\left( P_Z {{\rm e}^{it}}+1-P_Z \right) ^{n}$$ But unless $P_Y = P_Z$, there is no special name for the distribution of $X$.

EDIT: Maple does come up with a closed form for the probability mass function involving the associated Legendre function of the first kind:

$$\mathbb P(X=x) = \cases{ \dfrac{n!}{x!} P_n^{x-n}\left(\dfrac{2 P_Y P_Z - P_Y - P_Z}{P_Y - P_Z}\right) (P_Z - P_Y)^n \left(\dfrac{(1-P_Z)(1-P_Y)}{P_Z P_Y}\right)^{(n-x)/2} & if $0 \le x \le n$\cr \dfrac{n!}{(2n-x)!} P_n^{n-x}\left(\dfrac{2 P_Y P_Z - P_Y - P_Z}{P_Y - P_Z}\right) (P_Z - P_Y)^n \left(\dfrac{(1-P_Z)(1-P_Y)}{P_Z P_Y}\right)^{(n-x)/2} & if $n \le x \le 2n$}$$

EDIT: In response to Shakil's request, here is the Maple code:

> sum(binomial(n,k)*P[Z]^k*(1-P[Z])^(n-k)*
    binomial(n,x-k)*P[Y]^(x-k)*(1-P[Y])^(n-(x-k)),k=0..x) assuming x>=0,x<=n;
> simplify(%);
> sum(binomial(n,k)*P[Z]^k*(1-P[Z])^(n-k)*
    binomial(n,x-k)*P[Y]^(x-k)*(1-P[Y])^(n-(x-k)),k=x-n..n) assuming x>=n,x<=2*n;
> simplify(%);