Difference between two independent binomial random variables with equal success probability
Let $X$ ~ $Bin(n,p)$ and $Y$ ~ $Bin(m,p)$ be two independent random variables. Find the distribution of $Z=X-Y$.
see also Difference of two binomial random variables
I figured this out:
$$ P(Z=z)=\cases{\sum_{i=o}^{min(m,n)} Bin(k+i,n,p)*Bin(i,m,p), &if $z\ge0$;\cr \sum_{i=0}^{min(m,n)} Bin(i,n,p) * Bin(i-z, m, p),&otherwise. \cr}$$
I also validated it by Monte Carlo simulation. For $n=30$, $m=20$ and $p=0.5$, I get the following distribution, where the circles are the analytical probabilities and the line connects the MC estimates.
Because that looked to me pretty much like a binomial distribution, I gave it a try and figured out that its actually a binomial, just shifted by m to the left. This can be simply written as $P(Y=y) = Bin(y+m, m+n, p)$. Hence, given equal success probabilities, the sum of two independent binomially distributed random variables is binomial, but also their difference, just shifted to the left.
This question here difference between independent binomial variables is actually the same as mine, but received no answer and only the comment that there would be no simple formula. But the above formula looks pretty simple to me.
- Is it correct that for the case of equal success probabilities, the above equations actually describes the distribution of $Z=X-Y$?
- I read in a book that $Z$ could not be binomial distributed because it had negative support. Is it right to call it a shifted binomial?
Solution 1:
Your $Z=X-Y$ will not be a "shifted binomial" unless $p=\frac12$, or the trivial cases where at least one of $n$ and $m$ is zero. For the case $p=\frac12$, $m-Y$ has the same distribution as $Y$ so $X+Y$ and $X-Y+m$ have the same distribution, which is indeed binomial.
In general consider the means and variances of the distributions:
- $X$ has mean $np$ and variance $np(1-p)$
- $Y$ has mean $mp$ and variance $mp(1-p)$
- $X+Y$ has mean $(n+m)p$ and variance $(n+m)p(1-p)$
- $Z=X-Y$ has mean $(n-m)p$ and variance $(n+m)p(1-p)$
- $Z+m=X-Y+m$ has mean $np+m(1-p)$ and variance $(n+m)p(1-p)$
So for $Z+m$ to be binomial and to be supported on the integers from $0$ through to $n+m$, if its parameter was $q$, its mean would be $(n+m)q$ and variance $(n+m)q(1-q)$.
- To have $(n+m)q(1-q)=(n+m)p(1-p)$ requires $q=p$ or $q=1-p$ or $n+m=0$ (i.e. $n=m=0$).
- To have $q=p$ or $q=1-p$ and the two expressions for the mean equal requires $(n+m)p=np+m(1-p)$ or $(n+m)(1-p)=np+m(1-p)$, i.e. $mp=m(1-p)$ or $n(1-p)=np$, which would require $p=\frac12$ or $m=0$ or $n=0$.
So the only case where $n\gt0$ and $m\gt0$ where the means and variances match a binomial distribution is when $p=\frac12$.
Solution 2:
If $n$ and $m$ are sufficiently large, you can use the normal approximation:
$X \approx N[np,npq]$
$Y \approx N[mp,mpq]$
$X-Y \approx N[mp-np,mpq+npq]$