I've clicked XKCD's "random" button k times and I've already seen all of them. What's the expected number of XKCD's I've seen?

This seems like a modification of the coupon collector's problem which can be stated as follows:

There are $n$ coupons total to collect. Given that the past $k$ coupons seen I've already collected (coupons are collected with replacement), what's the expected number of coupon's I've collected so far?

I'm also unsure if this problem depends on an underlying prior probability distribution on the number of coupons I've collected; if it does, can this be solved with an arbitrary distribution?

One additional part to this question: let's say after the $k$th one I collect a new coupon; what would the expected number of coupon's I've collected be then?


If you've seen $n$ comics out of $N$, the probability that $k$ consecutive comics are all ones you've seen is $(\frac nN)^k$, which is proportional to $n^k$ (the $N^k$ denominator is constant). So if the prior odds are uniform, the posterior odds that you've seen $1, 2, \dots, N$ comics are $$ 1^k : 2^k : 3^k : \cdots : N^k. $$ So the conditional probability that you've seen $n$ comics out of $N$ is $\frac{n^k}{1^k + 2^k + \cdots + N^k}$, and we get an expected value of $$ \frac{1 \cdot 1^k + 2 \cdot 2^k + \dots + N \cdot N^k}{1^k + 2^k + \cdots + N^k}. $$ This doesn't simplify particularly well, but when $N$ is large compared to $k$, we may approximate the sum by an integral; the numerator is approximately $ \int_0^N x^{k+1}\,dx = \frac{N^{k+2}}{k+2}$ and the denominator is approximately $\int_0^N x^k \,dx = \frac{N^{k+1}}{k+1}$.

So the expected total number of comics you've read, given that you've sampled $k$ comics and you've read all of them, is approximately $\frac{k+1}{k+2} N$.


If finding $J$ (the number of draws to collect k distinct coupons) is the goal, then this is just Coupon's collector Problem in disguise where $k$ (number of distinct coupons you collected) acts like $n$ (total number of distinct coupons). Of course , the prior distribution matter for each type of coupon $i$;

$$ p_{i} = \Bbb P(n,i,k) = \Bbb P(i,n), $$

and the simplest case of distribution is still:

$$ p_{i} = \frac{n-(i-1)}{n}.$$ And just follow the steps in wiki link for Coupon problem but you need to sum to :

$$ \Bbb E(J|K) = n\left(\frac{1}{1} + \frac{1}{2} + ... + \frac{1}{n-k+1}\right)$$

BUT if you mean the number of draws $J$ is fixed and question is "What is the expectation value of $K$ : number of distinct coupons collected?" then:

$$ \Bbb E(K|J) = \sum_{k=1}^{min(J,n)}\sum_{i=1}^{n}k \Bbb P(i,k,J).$$

Up to you to find and supply $\Bbb P(i,k,J)$ which is prior distribution dependent.

We can also use Linearity of Expectation for Indicator variables $I_{i}$ indicating the existence of coupon $i$ in fixed number of draws $J$:

$$ \Bbb E(K|J) = \Bbb E(\sum_{i=1}^{n} I_{i} | J) = n \Bbb (P_{iJ})$$ where $$ \Bbb P_{iJ} = \sum_{m=1}^{J} {J \choose m}p_{i}^{m} (1-p_{i})^{J-m}$$ $$ = 1 - (1-p_{i})^J$$ so $$ \Bbb E(K|J) = n(1-(1-p_{i})^J).$$ And $p_{i}$ is whatever you want which is the probability of having Coupon I in fixed draws $J$ and example is $p_{i}= \frac{1}{n}$.

Let us make sense of this. For large $n$ => ($p_{i} = \frac{1}{n} << 1$):

$$ \Bbb E(K|J) \approx J$$ which make sense because most of the time if you draw J balls when the number of distinct colors is very large n, the number of distinct Balls you draw is equal to J itself: the given number of draws.

Also, we examine the case of large J: $$ \Bbb E(K|J) \approx n$$ which makes sense because if you draw large J then it would be very likely that you've seen all n distinct coupons.

Reference: https://en.wikipedia.org/wiki/Coupon_collector%27s_problem


Let $X$ be the random variable that counts how many distinct comics (out of the total $n$ comics) you've seen. Let $K$ be the event "the last $k$ random comics you were shown had already been seen".

As I understand it, the problem asks for $\Bbb E(X|K)$. We have that

$$\begin{align} \Bbb E(X|K) &= \sum_{x = 1}^n x\Bbb P(X = x | K) \\&= \sum_{x = 1}^n \frac{x\Bbb P(K | X = x) \Bbb P(X=x)}{\Bbb P(K)}. \end{align}$$

Now, $P(K | X = x)$ is easy enough to compute; that's simply $x^k/n^k$.
However, without some assumption on the underlying probability distribution of $X$, there's no meaningful way to attribute values to $\Bbb P(X=x)$ and $\Bbb P(K)$, or to their quotient.

Now, assuming we know the probability distribution of $X$ (which is discrete, and takes values on $0$, $1$, ..., $n$), then of course we know $\Bbb P(X=x)$. Moreover, by the law of total probability,

$$\begin{align} \Bbb P(K) &= \sum_{x = 1}^n \Bbb P(K | X = x) \cdot \Bbb P(X = x) \\&= \frac1{n^k}\sum_{x = 1}^n x^k \cdot \Bbb P(X = x) = \frac1{n^k}\Bbb E(X^k). \end{align}$$

Putting it all together we get

$$\begin{align} \Bbb E(X|K) &= \sum_{x = 1}^n \frac{x \cdot \frac{x^k}{n^k} \cdot \Bbb P(X=x)}{\Bbb E(X^k)/n^k} \\&= \frac{1}{\Bbb E(X^k)}\left(\sum_{x = 1}^n x^{k+1} \cdot \Bbb P(X=x)\right) = \frac{\Bbb E(X^{k+1})}{\Bbb E(X^k)}. \end{align}$$

If we assume that $X$ has a uniform distribution (including $0$ or not), we recover Misha's answer.