Why, when generating $N$ random numbers between $0$ and $N-1$, the number of unique numbers is always around $0.63N$?
Someone told me that the answer is $1-1/e$ but didn't tell how exactly he got that answer. Can someone explain me why it is $1-1/e$? Code below is just in case if someone wants to check:
ArrayList<Integer> integers = new ArrayList<>();
for (int i = 0; i < 20000000; i++) {
integers.add((int) (Math.random()*20000000));
}
HashSet<Integer> unique = new HashSet<>(integers);
For any particular integer between $0$ and $N-1$, in every turn, the odds that it is not picked randomly is $\frac{N-1}{N}=1-\frac{1}{N}$. If you repeat it $N$ times, the odds that it is never picked is $\left(1-\frac{1}{N}\right)^N$. It is well-known that:
$$\lim_{N\to\infty}\left(1-\frac{1}{N}\right)^N=\frac{1}{e}$$
(For example, consider $\left(1-\frac{1}{N}\right)^N=e^{N\log\left(1-\frac{1}{N}\right)}=e^{-\frac{\log\left(1-\frac{1}{N}\right)}{-\frac{1}{N}}}\to e^{-1}=\frac{1}{e}$ as $N\to\infty$ because $\lim_{x\to 0}\frac{\log(1+x)}{x}=1$ and $-\frac{1}{N}\to 0$ when $N\to\infty$.)
Thus, the probability that the number is never picked is close to $\frac{1}{e}$, and so the probability that the number is picked is close to $1-\frac{1}{e}$.
Suppose that you want to generate a list of $n$, e.g. $n=$250, random numbers with replacement. So something like (230,19,12, 55,0, 55, 134,...). The probability that any given number is chosen for each entry in the list is $1/n$. So the probability a number gets excluded $n$ times is $(1-1/n)^n$. As $n$ tends to infinity, the probability that a large list will exclude any given number is $1/e$. Conversely, the probability that it will include any given number is $1-1/e$. The expected number of included numbers is thus $n(1-1/e)$.