Does the sum of random variables sampled with/without substitution differ for large populations?

Solution 1:

Consider a random binary sequence $s_n$ of size $2n$ s.t. $\sum_{i=1}^{2n}s_{n,i}=n$. Then the number of runs of length $2$ ($11$ or $00$) is $$ S_{2n}=\sum_{i=1}^{2n-1}\xi_{n,i}+\zeta_{n,i}, $$ where $\xi_{n,i}=s_{n,i}s_{n,i+1}$ and $\zeta_{n,i}=(1-s_{n,i})(1-s_{n,i+1})$. Now $$ \mathsf{E}\xi_{n,i}=\mathsf{P}(\xi_{n,i}=1)=\frac{(2n-2)!}{(n-2)!n!}\left(\frac{(2n)!}{n!n!}\right)^{-1}=\frac{1}{2}\frac{n-1}{2n-1}, $$ $$ \mathsf{E}\xi_{n,i}^2=\mathsf{P}(\xi_{n,i}=1), $$ \begin{align} \mathsf{E}\xi_{n,i}\xi_{n,j}&=\mathsf{P}(\xi_{n,i}=1,\xi_{n,j}=1)=\frac{(2n-4)!}{(n-4)!n!}\left(\frac{(2n)!}{n!n!}\right)^{-1} \\ &=\frac{1}{4}\frac{(n-2)(n-3)}{(2n-1)(2n-3)}, \end{align} if $|i-j|>1$, and $$ \mathsf{E}\xi_{n,i}\xi_{n,j}=\frac{1}{4}\frac{n-2}{2n-1}, $$ if $|i-j|=1$. Finally, $\mathsf{E}\xi_{n,i}\zeta_{n,j}=0$, if $|i-j|\le 1$, and $$ \mathsf{E}\xi_{n,i}\zeta_{n,j}=\frac{1}{4}\frac{n(n-1)}{(2n-1)(2n-3)}, $$ if $|i-j|>1$.


We can compute the moments of $S_{2n}$: $$ \mathsf{E}S_{2n}=n-1. $$ \begin{align} \mathsf{E}S_n^2&=\sum_{i=1}^{2n-1}\mathsf{E}[\xi_{n,i}^2+\zeta_{n,i}^2]+\sum_{|i-j|=1}\mathsf{E}[\xi_{n,i}\xi_{n,j}+\zeta_{n,i}\zeta_{n,j}] \\ &\quad+\sum_{|i-j|>1}\mathsf{E}[(\xi_{n,i}+\zeta_{n,i})(\xi_{n,j}+\zeta_{n,j})] \\ &=(n-1)+\frac{2(n-1)(n-2)}{2n-1}+\frac{2(n^2-3n+3)(n-1)}{2n-1} \\ &=\frac{(n-1)(2n^2-2n+1)}{2n-1}, \end{align} and, thus, $$ \operatorname{Var}(S_{2n})=\frac{n(n-1)}{2n-1}. $$ As $n\to \infty$, $\mathsf{E}S_{2n}/(2n)\to 1/2$ and $\operatorname{Var}(S_{2n}/\sqrt{2n})\to 1/4$. Also, since $\operatorname{Cov}(\xi_{n,i},\xi_{n,i+1})\to 1/16$, and $\operatorname{Cov}(\xi_{n,i},\xi_{n,j})$ and $\operatorname{Cov}(\xi_{n,i},\zeta_{n,j})$ converge to $0$ when $|i-j|>1$, $(S_{2n}-\mathsf{E}S_{2n})/\sqrt{2n}$ behaves similarly to the same quantity when $s_{n,i}$'s are i.i.d. Bernoulli random variables.