coupon collector problem with groups

I was looking at various extensions of the classical Coupon Collector problem, but couldn't find any answers or hints for the following modification:

Assume there are $n$ distinct coupons and you get them in batches of a different (random) size. That is, assume that every time you get a black box with coupons in it, whose number can be from $0$ to $n$ and coupons within a black box are distinct. The probability of having a specific coupon in a box is $p$ (that is, it's the same for all). What is the expected number of black-boxes one needs to open to collect all coupons?


Solution 1:

I'll assume that you intended to imply that each coupon independently has probability $p$ to be in any given black box.

Then we have $n$ independent Bernoulli trials with probability $p$ and are looking for the expected time for all of them to have succeeded at least once. The probability for a given coupon not to have been collected after $k$ black boxes is $(1-p)^k$, so it has been collected with probability $1-(1-p)^k$, so the probability for all coupons to have been collected is $\left(1-(1-p)^k\right)^n$, so the probability that they haven't all been collected is $1-\left(1-(1-p)^k\right)^n$. The expected number of black boxes required to collect all coupons is the sum of these probabilities:

\begin{align} \sum_{k=0}^\infty\left(1-\left(1-(1-p)^k\right)^n\right) &=\sum_{k=0}^\infty\left(1-\sum_{j=0}^n\binom nj\left(-(1-p)^k\right)^j\right) \\ &=\sum_{k=0}^\infty\sum_{j=1}^n(-1)^{j-1}\binom nj(1-p)^{jk} \\ &=\sum_{j=1}^n(-1)^{j-1}\binom nj\sum_{k=0}^\infty(1-p)^{jk} \\ &=\sum_{j=1}^n(-1)^{j-1}\binom nj\frac1{1-(1-p)^j}\;. \end{align}

The same result can also be derived using the maximum-minimums identity. The time it takes to collect all coupons is the maximum of the times $X_i$, where $X_i$ is the time it takes to collect the $i$-th coupon:

\begin{align} \def\ex#1{\mathbb E\left[#1\right]} \ex{\max_iX_i} &=\sum_i\ex{X_i}-\sum_{i\lt j}\ex{\min\{X_i,X_j\}}+\sum_{i\lt j\lt k}\ex{\min\{X_i,X_j,X_k\}}-\cdots \\ &=\sum_{j=1}^n(-1)^{j-1}\binom nj\frac1{1-(1-p)^j}\;, \end{align}

since the minimum of the completion times of $j$ particular coupons is the expected time it takes for a Bernoulli trial with success probability $1-(1-p)^j$ to succeed. Equivalently, we could use inclusion-exclusion to find the probability that not all coupons have been collected and then sum over $k$ as above.