Finding expected number of distinct values selected from a set of integers

Solution 1:

We first answer the exact question asked, where one draws three items, then we present a more powerful approach which solves the more general case where one draws any number of items.


The probability that the first item was not chosen before is $1$. The probability that the second item was not chosen before is $\frac{n-1}n$. The probability that the third item was not chosen before is $\frac{n-2}n$ if the two first items are different and $\frac{n-1}n$ if the two first items coincide. The two first items are different with probability $\frac{n-1}n$ and they coincide with probability $\frac{1}n$. Hence the expected number of distinct items is $$ 1+\frac{n-1}n+\frac{n-2}n\times\frac{n-1}n+\frac{n-1}n\times\frac1n=\frac{3n^2-3n+1}{n^2} $$


More generally, consider the number $N_k$ of different items chosen after $k$ picks. Then $N_0=0$ almost surely and, knowing $N_k$ the probability to pick a new item at time $k+1$ is $(n-N_k)/n$ hence $$\mathrm E(N_{k+1}\mid N_k)=N_k+\frac{n-N_k}n\qquad\text{almost surely}$$ This shows that the expected number of different items after $k$ picks $\mathrm E(N_k)$ is such that $\mathrm E(N_0)=0$ and $$\mathrm E(N_{k+1})=\mathrm E(N_k)+\frac{n-\mathrm E(N_k)}n=1+a_n\mathrm E(N_k)$$ for every $k\geqslant0$, with $$a_n=1-\frac1n$$ Thus, for every $k\geqslant0$, $$\mathrm E(N_k)=\frac{1-a_n^k}{1-a_n}$$ or, equivalently, $$ \mathrm E(N_k)=n\,\frac{n^k-(n-1)^k}{n^{k}}=\sum\limits_{i=0}^{k-1}(-1)^{i}{k\choose i+1}\frac1{n^i} $$ For example, the fifth row of the Pascal triangle reads $$1\quad5\quad10\quad10\quad5\quad1$$ hence $$E(N_5)=\frac{5n^4-10n^3+10n^2-5n+1}{n^5}$$

Solution 2:

@Did's method uses recursion very nicely to arrive at the expected number. I arrived at the same answer using more mundane computations.

Let's generalize by picking $p$ numbers from $1\dots n$ with replacement. Let us compute the probability of choosing $d$ distinct numbers. Choose one of the $\binom{n}{d}$ sets of $d$ distinct numbers. The probability of selecting all $p$ picks from those $d$ distinct numbers is $\left(\frac{d}{n}\right)^p$. However, this also counts cases where some of the $d$ numbers were not chosen. Inclusion-exclusion says that the probability of picking all of those $d$ is $$ \sum_k(-1)^{k}\binom{d}{d-k}\left(\frac{d-k}{n}\right)^p=\sum_k(-1)^{d-k}\binom{d}{k}\left(\frac{k}{n}\right)^p\tag{1} $$ Thus, the probability of picking exactly $d$ distinct items is $\binom{n}{d}$ times $(1)$. The expected value is therefore $$ \begin{align} &\sum_d\sum_k(-1)^{d-k}d\binom{n}{d}\binom{d}{k}\left(\frac{k}{n}\right)^p\\ &=\sum_d\sum_k(-1)^{d-k}d\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-d)\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-k)\binom{n}{k}\binom{n-k-1}{n-d-1}\left(\frac{k}{n}\right)^p\\ &=n-\sum_k(n-k)\binom{n}{k}\delta(n-k-1)\left(\frac{k}{n}\right)^p\\ &=n-n\left(\frac{n-1}{n}\right)^p\\ &=n\left(1-\left(\frac{n-1}{n}\right)^p\right)\tag{2} \end{align} $$

Solution 3:

I would like to give a simple reasoning for the same.

Suppose we have a SRS of size p selected from a population of size n with replacement. To find the expected no. of unique elements in the sample, we make use of the linearity of expectation.

Let X: no. of unique elements in p picks Let Xi: Indicator that i is a unique element

Then, X = ∑Xi ,

Where Xi=1, if i is selected in p picks & Xi=0, otherwise

Then,E(X) = ∑E(Xi) = ∑P(i is selected in p picks) = ∑(1-P(i is not selected in p picks)) = ∑(1-((n-1)/n)^p)= n(1−((n-1)/n)^p)

Note: Here, all the picks are independent (SRSWR)