Expected time to roll all 1 through 6 on a die

What is the average number of times it would it take to roll a fair 6-sided die and get all numbers on the die? The order in which the numbers appear does not matter.

I had this questions explained to me by a professor (not math professor), but it was not clear in the explanation. We were given the answer $(1-(\frac56)^n)^6 = .5$ or $n = 12.152$

Can someone please explain this to me, possibly with a link to a general topic?


The time until the first result appears is $1$. After that, the random time until a second (different) result appears is geometrically distributed with parameter of success $5/6$, hence with mean $6/5$ (recall that the mean of a geometrically distributed random variable is the inverse of its parameter). After that, the random time until a third (different) result appears is geometrically distributed with parameter of success $4/6$, hence with mean $6/4$. And so on, until the random time of appearance of the last and sixth result, which is geometrically distributed with parameter of success $1/6$, hence with mean $6/1$. This shows that the mean total time to get all six results is $$\sum_{k=1}^6\frac6k=\frac{147}{10}=14.7.$$


Edit: This is called the coupon collector problem. For a fair $n$-sided die, the expected number of attempts needed to get all $n$ values is $$n\sum_{k=1}^n\frac1k,$$ which, for large $n$, is approximately $n\log n$. This stands in contrast with the mean time needed to complete only some proportion $cn$ of the whole collection, for some fixed $c$ in $(0,1)$, which, for large $n$, is approximately $-\log(1-c)n$. One sees that most of the $n\log n$ steps needed to complete the full collection are actually spent completing the last one per cent, or the last one per thousand, or the last whatever percentage of the collection.


Here's the logic:

The chance of rolling a number you haven't yet rolled when you start off is $1$, as any number would work. Once you've rolled this number, your chance of rolling a number you haven't yet rolled is $5/6$. Continuing in this manner, after you've rolled $n$ different numbers the chance of rolling one you haven't yet rolled is $(6-n)/6$.

You can figure out the mean time it takes for a result of probability $p$ to appear with a simple formula: $1/p$. Furthermore, the mean time it takes for multiple results to appear is the sum of the mean times for each individual result to occur.

This allows us to calculate the mean time required to roll every number: $t = 1/1 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1 = 1 + 12/10 + 15/10 + 2 + 3 + 6 = 12 + 27/10 = 14.7$


Associate a success with each number appearing that has not appeared before. Let $X_i$ be the number of trials between the $i^{th}$ success and the $(i + 1)^{st}$ success.

Let $X$ be the random variable representing the total number of trials required for the required event, and $E[X]$ be the required expected value.

Then by linearity of Expectation, we have $E[X] = 1 + \sum_{i=1}^{5}E[X_i]$.

To calculate $E[X_i]$, consider the following, after receiving $i−1$ different numbers i.e after $i −1$ successes, each subsequent trial has probability $(6 − i)/6$ of getting a number that has not been appeared before.

Therefore,the random variable $X_i$ is geometric with parameter $p_i = (6−i)/6$, therefore $E[X_i] = 1/p_i = 6/(6-i)$.

It follows that $E[X] = 1 + 6\sum_{i=1}^{5}1/i$.

Hence $E[X]=14.7$.