How to understand the mean and variance of Hypergeometric distribution intuitively
Hypergeometric distribution: We have $N$ balls with $n$ red balls and $N-n$ blue balls. Now we draw k balls $(0\leq k\leq n),$ then the number of red balls among $k$ balls follows the Hypergeometric distribution.
If we denote $X_i,$ the i-th draing, $X_i =1$ if we draw red ball, else $X_i = 0.$ By induction, we can proof that $P(X_i = 1) = \dfrac{n}{N}.$ Therefore $$E = E\left[\sum X_i\right] = k\dfrac{n}{N}.$$
My question is that
- is there any intuitive way to understand $P(X_i = 1) = \dfrac{n}{N}$ without induction (something like binomial distribution).
- How to compute $E[X_iX_j],\ i\neq j$ for the variance?
Let's forget the formula of combinations number
!
Solution 1:
Symmetry suggests that $p:=E[X_i]=P[X_i=1]$ doesn't depend on $i$, and clearly $P[X_1=1] = n/N$. (If I draw the balls from the urn, obtaining $X_1, X_2,\ldots, X_k$, and then randomly reorder the results as $Y_1,Y_2,\ldots,Y_k$ before revealing them to you, the joint distribution is unchanged.)
Likewise, by the same symmetry, if $i\not=j$ then $c:=E[X_iX_j] =P[X_i=X_j=1]$ is the same for all distinct $i,j$. Therefore $c=P[X_1=X_2=1] ={n\over N}\cdot{n-1\over N-1}$.
Solution 2:
-
Each $X_i$ is identically (not independently) Bernoulli with success probability $n/N$: the fraction of all balls that are red.
-
$X_iX_j,i\neq j$ is also Bernoulli with success probability $P(X_i=1,X_j=1)=P(X_i=1)P(X_j=1|X_i=1)=\frac{n}{N}\frac{n-1}{N-1}$ (due to drawing without replacement).