Singletons in coupon collecting problem

They are looking for the chance that you get another $T_i$. You can collect it again until you have found all the other types you are looking for. After you get the first $T_i$, they say make a list of the first occurrence after now of $T_i$ and all the coupons you have not found yet. If $T_i$ is the last, you will only have one $T_i$ when you complete your set. If it is not the last, you will have a duplicate $T_i$ when you complete your set.

As a specific example, say you have all the coupons except $a,b,c$. Now you draw your first $a$ (this is $T_i$ in the above). They say you should look at the order of the next draw of $a,b,c$. There are $3!$ possible orders, of which $2$ have $a$ after the others, so you now have $\frac 23$ chance of getting a second $a$ before you finish the set and a $\frac 13$ chance you get your complete set while you have only one $a$.


By way of enrichment here is the expectation using Stirling numbers of the second kind. In referencing the notation from this MSE link we have $n$ coupons, and ask about the expected number of singletons once a complete set of $n$ different coupons has been drawn. We will be using OGFs and EGFs of Stirling numbers and switch between them.

First let us verify that we indeed have a probability distribution here. We have for the number $T$ of coupons being $m$ draws classified according to the number of singletons that

$$P[T=m] = \frac{1}{n^m} \times {n\choose n-1} \\ \times \sum_{q=0}^{n-1} {n-1\choose q} {m-1\choose q} q! {m-1-q\brace n-1-q}_{\ge 2} (n-1-q)!.$$

What is happening here is that we first choose the $n-1$ types of coupons that go into the prefix, where the one not selected goes into the suffix, completing the set of coupons. Next we choose $q$ coupons from the ones in the prefix which will be represented by singletons (factor ${n-1\choose q}$). Next we choose the positions from the available slots where the singletons will be placed (factor ${m-1\choose q} q!$). We split the leftover $m-1-q$ slots into sets of at least two elements, one for each of the $n-1-q$ types that have not been instantiated (factor ${m-1-q\brace n-1-q}_{\ge 2} (n-1-q)!$).

This probability simplifies to

$$P[T=m] = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} \frac{(n-1)!}{q!} \frac{1}{(m-1-q)!} {m-1-q\brace n-1-q}_{\ge 2} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} \frac{(n-1)!}{q!} [z^{m-1-q}] \frac{(\exp(z)-z-1)^{n-1-q}}{(n-1-q)!} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} {n-1\choose q} [z^{m-1-q}] (\exp(z)-z-1)^{n-1-q} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} {n-1\choose q} [z^{m-1}] z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n \times (m-1)!}{n^m} [z^{m-1}] (\exp(z)-1)^{n-1} \\ = \frac{n! \times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-1)!}.$$

We then get for the sum of the probabilities (observe that the EGF has morphed into an OGF)

$$\sum_{m\ge 1} P[T=m] = \frac{n!}{n} \sum_{m\ge 1} \frac{1}{n^{m-1}} [z^{m-1}] \prod_{q=1}^{n-1} \frac{z}{1-qz} = \frac{n!}{n} \prod_{q=1}^{n-1} \frac{1/n}{1-q/n} \\ = \frac{n!}{n} \prod_{q=1}^{n-1} \frac{1}{n-q} = \frac{n!}{n} \frac{1}{(n-1)!} = 1.$$

The probabilities sum to one and the sanity check goes through.

Continuing with the expected number of singletons we get an extra factor $q$ which yields

$$\frac{n \times (m-1)!}{n^m} \sum_{q=1}^{n-1} q {n-1\choose q} [z^{m-1}] z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n(n-1) \times (m-1)!}{n^m} [z^{m-1}] \sum_{q=1}^{n-1} {n-2\choose q-1} z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n(n-1) \times (m-1)!}{n^m} \\ \times [z^{m-1}] z \sum_{q=1}^{n-1} {n-2\choose q-1} z^{q-1} (\exp(z)-z-1)^{n-2-(q-1)} \\ = \frac{n(n-1) \times (m-1)!}{n^m} [z^{m-2}] (\exp(z)-1)^{n-2} \\ = \frac{n! \times (m-1)!}{n^m} [z^{m-2}] \frac{(\exp(z)-1)^{n-2}}{(n-2)!}.$$

Now we have

$$\sum_{m\ge 2} w^{m-2} (m-1)! [z^{m-2}] \sum_{q\ge 0} A_q \frac{z^q}{q!} \\ = \sum_{m\ge 2} w^{m-2} (m-1) A_{m-2} = \left.\left(z \sum_{q\ge 0} A_q z^q\right)'\right|_{z=w}.$$

Applying this to the expectation yields

$$\frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \left(\prod_{q=0}^{n-2} \frac{z}{1-qz}\right)' \\ = \frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \prod_{q=0}^{n-2} \frac{z}{1-qz} \sum_{q=0}^{n-2} \frac{1-qz}{z} \frac{1}{(1-qz)^2} \\ = \frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \prod_{q=0}^{n-2} \frac{z}{1-qz} \sum_{q=0}^{n-2} \frac{1/z}{1-qz} \\ = \frac{n!}{n^2} \prod_{q=0}^{n-2} \frac{1/n}{1-q/n} \sum_{q=0}^{n-2} \frac{n}{1-q/n} \\ = n! \prod_{q=0}^{n-2} \frac{1}{n-q} \sum_{q=0}^{n-2} \frac{1}{n-q}.$$

This simplifies to the end result

$$\bbox[5px,border:2px solid #00A000]{ H_n \sim \log n + \gamma}$$

where we have included an increment of one that represents the singleton which completed the set of coupons.

There is also a beginning level Perl script available which will confirm this formula for approximately four digits precision.

#! /usr/bin/perl -w
#

MAIN: {
    my $n = shift || 5;
    my $trials = shift || 1000;

    my $data = 0;

    for(my $tind = 0; $tind < $trials; $tind++){
        my $seen = 0; my @dist;

        @dist[1..$n] = (0) x $n;

        while($seen < $n){
            my $coupon = 1 + int(rand($n));

            $seen++ if $dist[$coupon] == 0;
            $dist[$coupon]++;
        }

        my $single = 0;
        for(my $type = 1; $type <= $n; $type++){
            $single++ if $dist[$type] == 1;
        }

        $data += $single;
    }

    print $data/$trials;
    print "\n";

    1;
}

This post made extensive use of the technique of annihilated coefficient extractors (ACE). There are more of these at this MSE link I and at this MSE link II and also here at this MSE link III.

Addendum. We can simplify the above computation somewhat by using the species of ordered set partitions with singletons marked. This is

$$\mathfrak{S}(\mathcal{U}\mathcal{V}\mathcal{Z} + \mathcal{U}\mathfrak{P}_{\ge 2}(\mathcal{Z}))$$

and has generating function

$$G(z,u,v) = \frac{1}{1-u(\exp(z)-z+vz-1)}.$$

We are partitioning $m-1$ slots into $n-1$ sets and we extract

$$(m-1)! [z^{m-1}] [u^{n-1}] G(z,u,v) \\ = (m-1)! [z^{m-1}] (\exp(z)-z+vz-1)^{n-1}.$$

We thus obtain a generating function for the probability which encapsulates its value but classifies according to the number of singletons which is

$$P[T = m] = \frac{1}{n^m} \times {n\choose n-1} \times (m-1)! [z^{m-1}] (\exp(z)-z+vz-1)^{n-1} \\ = \frac{n!\times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-z+vz-1)^{n-1}}{(n-1)!}.$$

I.e. $n^m P[T=m]$ is the OGF of sequences of draws where the last coupon is obtained at draw number $m,$ classified according to the number of singletons represented by the exponent on $v.$ E.g. for four coupons and six draws we obtain

$$360v+240v^2.$$

Divide by four to account for the choice of the last coupon, leaving five draws and three coupons and the term $90v+60v^2$ (the last coupon was a singleton but we did not count it until the very end, see above). For one singleton choose it in three ways and combine with pairs of the remaining two types to get $3\times {5\choose 2,2,1} = 90.$ For two singletons choose the two types in ${3\choose 2}$ ways, the remaining type gets three slots and we have $3\times {5\choose 3,1,1} = 60.$

Returning to the main thread and setting $v=1$ we remove the marking and obtain the probability

$$\frac{n!\times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-1)!}$$

and may continue as before.

We compute the expectation by differentating with respect to $v$ and setting $v=1$ and obtain

$$\left.\frac{n!\times (m-1)!}{n^m} [z^{m-1}] (n-1) \frac{(\exp(z)-z+vz-1)^{n-2}}{(n-1)!} \times z\right|_{v=1} \\ = \frac{n!\times (m-1)!}{n^m} [z^{m-2}]\frac{(\exp(z)-1)^{n-2}}{(n-2)!}.$$

We then continue once more as in the first version.