Coupon Collector Problem with Batched Selections

I am trying to solve a variation on the coupon collector's problem.

In this scenario, someone is selecting coupons at random with replacement from n different possible coupons. However, the person is not selecting coupons one at a time, but instead, in batches.

Here's an example problem formulation: There are 100 distinct coupons. A person makes selections in 10-coupon batches at random (each coupon with replacement). What is the expected number of batches necessary to have selected 80 unique coupons?

I have been able to determine the expected number of selections necessary to have selected k unique coupons when selecting one at a time (much like Henry's answer to a similar question), but I'm a bit stumped as to how to go about solving it with this particular wrinkle.

Any tips/guidance would be greatly appreciated.


The probability not to have all $n$ coupons after drawing $k$ coupons is

$$ \def\stir#1#2{\left\{#1\atop#2\right\}} P(K\gt k)=1-\frac{n!}{n^k}\stir kn $$

(see e.g. Probability distribution in the coupon collector's problem). Thus, if we draw in batches of $b$ coupons, the expected number of batches is

\begin{align} \sum_{j=0}^\infty P(K\gt jb) &= \sum_{j=0}^\infty\left(1-\frac{n!}{n^{jb}}\stir{jb}n\right) \\ &= \sum_{j=0}^\infty\left(1-\frac1{n^{jb}}\sum_{l=0}^n(-1)^{n-l}\binom nll^{jb}\right) \\ &= \sum_{j=0}^\infty\sum_{l=0}^{n-1}(-1)^{n-l+1}\binom nl\left(\frac ln\right)^{jb} \\ &= \sum_{l=0}^{n-1}(-1)^{n-l+1}\binom nl\frac1{1-\left(\frac ln\right)^b}\;. \end{align}

For instance, for $n=2$, this is

$$ \frac2{1-2^{-b}}-1=1+\frac2{2^b-1}\;. $$


Let $N_t$ denote the number of different coupons after the selection of $t$ batches of size $b$ drawn from $n$ different coupons and $T_k=\inf\{t\geqslant0\mid N_t\geqslant k\}$ (here, $b=10$, $n=100$ and $k=80$). One is interested in $$ \mathrm E_{b,n}(T_k)=\sum\limits_{t\geqslant0}\mathrm P_{b,n}(T_k\gt t)=\sum\limits_{t\geqslant0}\mathrm P_{b,n}(N_t\lt k). $$ The way to compute the value of $\mathrm E_{b,n}(T_k)$ is unclear, although the dynamics of $(N_t)_t$ is easy to describe.

To wit, $N_0=0$, $N_1=b$, and if $N_t=i$, $N_{t+1}=i+M_t$ where $0\leqslant M_t\leqslant b$ almost surely and the distribution of $M_t$ depends on $i$ and may be written down. For example, if $b=2$, $M_t=0$ with probability $\frac{i(i-1)}{n(n-1)}$, $M_t=1$ with probability $\frac{2i(n-i)}{n(n-1)}$ and, finally, $M_t=2$ with probability $\frac{(n-i)(n-i-1)}{n(n-1)}$. Thus, $\mathrm E_{2,n}(M_t\mid N_t)=2(1-N_t/n)$ and $\mathrm E_{2,n}(N_t)=n(1-(1-2/n)^t)$.

More generally, $\mathrm E_{b,n}(M_t\mid N_t)=b(1-N_t/n)$ and $\mathrm E_{b,n}(N_t)=n(1-(1-b/n)^t)$.

One might want to estimate $\mathrm E_{b,n}(T_k)$ by the root $t_{b,n}(k)$ of the equation $k=\mathrm E_{b,n}(N_t)$, that is, $$ t_{b,n}(k)=\frac{\log(1-k/n)}{\log(1-b/n)}, $$ for example, $t_{10,100}(80)\approx15.28$. However, the quality of this approximation (or rather, the range of parameters $(n,b,k)$ where it is passable) should be checked, numerically or otherwise.