Expected number of draws to find a match
Solution 1:
Introductory remark. This answer was originally written to treat the case of seeing the first repeated value when drawing without replacement from a deck of cards as proposed at this MSE link. It answers the corresponding case of $n$ pairs of socks as well, however, consult the end of the document for this.
We solve the problem where we have $j$ instances of each of $n$ types of coupons and draw without replacement until we have seen $2$ coupons of some type. For a deck of cards we have $13$ types of coupons and $4$ instances of each type. Using the notation from the following MSE link I and MSE link II we introduce the marked generating function
$$\left(1 + j w z\right)^n.$$
The coefficient on $[z^m]$ here represents distributions of sequences of $m$ draws from the $n$ types according to probability, where the ones that occur one time have been marked. Each of the latter may be augmented to a pair of some color where the weight is $j-1$ because one coupon has already been drawn. As we only need the count we differentiate with respect to $w$ and set $w=1$, getting
$$n\times \left(1 + jz\right)^{n-1} \times j z.$$
With the method from the linked posts we thus obtain for the probability
$$P[T = m] = \frac{1}{m!} {nj\choose m}^{-1} (m-1)! \times (j-1) \times [z^{m-1}] n j z (1 + jz)^{n-1} \\ = \frac{nj}{m} {nj\choose m}^{-1} \times (j-1) \times [z^{m-2}] (1 + jz)^{n-1} \\ = \frac{nj}{m} {nj\choose m}^{-1} \times (j-1) \times {n-1\choose m-2} j^{m-2} \\ = (j-1) \times {nj-1\choose m-1}^{-1} {n-1\choose m-2} j^{m-2}.$$
Next we verify that this is a probability distribution. The process may halt after two steps at the earliest and $n+1$ at the latest and we get
$$\sum_{m=2}^{n+1} P[T=m] = (j-1) \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} {n-1\choose m-2} j^{m-2} \\ = (j-1) \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} \frac{m-1}{n} {n\choose m-1} j^{m-2} \\ = \frac{j-1}{n} \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} {n\choose m-1} (m-1) j^{m-2}.$$
We have
$${nj-1\choose m-1}^{-1} {n\choose m-1} = \frac{n!\times (nj-m)!}{(nj-1)!\times (n-(m-1))!} \\ = {nj-1\choose n}^{-1} {nj-m\choose n-(m-1)}.$$
Here we have used the fact that for the scenario to make sense we must have $j\ge 2.$ Continuing we find
$$\frac{j-1}{n} {nj-1\choose n}^{-1} \sum_{m=2}^{n+1} {nj-m\choose n-(m-1)} (m-1) j^{m-2}$$
The sum term yields
$$\sum_{m=1}^{n} {nj-1-m\choose n-m} m j^{m-1} = \sum_{m\ge 1} [w^{n-m}] (1+w)^{nj-1-m} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \sum_{m\ge 1} w^m (1+w)^{-m} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \frac{w}{(1+w)} \sum_{m\ge 1} w^{m-1} (1+w)^{-(m-1)} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \frac{w}{(1+w)} \frac{1}{(1-wj/(1+w))^2} \\ = [w^{n-1}] (1+w)^{nj} \frac{1}{(1+w-wj)^2} = [w^{n-1}] (1+w)^{nj} \frac{1}{(1-(j-1)w)^2}.$$
Extracting coefficients we find
$$\sum_{q=0}^{n-1} {nj\choose n-1-q} (q+1) (j-1)^q = \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (q+1) (j-1)^q \\ = nj \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^q - n(j-1) \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^q \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^q \\ - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=-1}^{n-2} {nj-1\choose nj-n+q+1} (j-1)^{q+1} \\ - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q+1} (j-1)^{q+1} \\ + n {nj-1\choose nj-n} - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} = n {nj-1\choose nj-n}.$$
Collecting everything we obtain
$$\frac{j-1}{n} {nj-1\choose n}^{-1} \times n \times {nj-1\choose n-1} \\ = \frac{j-1}{n} {nj-1\choose n}^{-1} \times n \times {nj-1\choose n} \frac{n}{nj-n} = 1$$
and we have confirmed that we have a probability distribution.
The next step is to compute the expectation. Recapitulating the earlier computation we find that
$$E[T] = \sum_{m=2}^{n+1} m P[T=m] = \frac{j-1}{n} {nj-1\choose n}^{-1} \sum_{m=1}^{n} {nj-1-m\choose n-m} (m+1) m j^{m-1}$$
or
$$\bbox[5px,border:2px solid #00A000]{ E[T] = \frac{2(j-1)}{n} {nj-1\choose n}^{-1} [w^{n-1}] (1+w)^{nj+1} \frac{1}{(1-(j-1)w)^3}.}$$
Extracting coefficients we obtain the closed form
$$\bbox[5px,border:2px solid #00A000]{ E[T] = \frac{(j-1)}{n} {nj-1\choose n}^{-1} \sum_{q=0}^{n-1} {nj+1\choose n-1-q} (q+2)(q+1) (j-1)^q.}$$
Observe that for a deck of cards we get
$$E[T] = {\frac {226087256246}{39688347475}} \approx 5.696565129.$$
Furthermore this simplifies when $j=2$ (pairs of socks). Instantiating $j$ to $2$ will produce
$$\frac{2}{n} {2n-1\choose n}^{-1} [w^{n-1}] (1+w)^{2n+1} \frac{1}{(1-w)^3}.$$
The coefficient is
$$\mathrm{Res}_{w=0} \frac{1}{w^n} (1+w)^{2n+1} \frac{1}{(1-w)^3}.$$
Note that the residue at infinity is given by
$$- \mathrm{Res}_{w=0} \frac{1}{w^2} w^n \frac{(1+w)^{2n+1}}{w^{2n+1}} \frac{1}{(1-1/w)^3} = - \mathrm{Res}_{w=0} \frac{1}{w^2} \frac{(1+w)^{2n+1}}{w^{n+1}} \frac{w^3}{(w-1)^3} \\ = \mathrm{Res}_{w=0} \frac{(1+w)^{2n+1}}{w^{n}} \frac{1}{(1-w)^3}.$$
Hence the value is minus half the residue at $w=1$. We find with $(1-w)^3 = - (w-1)^3$
$$\frac{1}{2} \times \frac{1}{2} \left.\frac{1}{w^n} (1+w)^{2n+1} \left(\frac{n(n+1)}{w^2} - \frac{2n(2n+1)}{w (1+w)} + \frac{(2n+1)(2n)}{(1+w)^2}\right)\right|_{w=1} \\ = 2^{2n-1} \left(n^2+n - 2n^2-n + n^2 + \frac{1}{2} n\right) = \frac{1}{4} n 4^n.$$
Now observe that
$${2n-1\choose n}^{-1} = {2n\choose n}^{-1} \times 2n \times \frac{1}{n} = 2 {2n\choose n}^{-1} \sim 2 \times \frac{\sqrt{\pi n}}{4^n}$$
We thus have the closed form for $j=2$
$$\bbox[5px,border:2px solid #00A000]{ E[T] = {2n-1\choose n}^{-1} \frac{1}{2} 4^n = {2n\choose n}^{-1} 4^n.}$$
and we get the nice asymptotic
$$\bbox[5px,border:2px solid #00A000]{ E[T] \sim \sqrt{\pi n}.}$$
There is also a very basic C program which confirmed the closed form of the expectations for all combinations of $n$ and $j$ that were examined. For example with $j=5$ we get the expectations
$$2,{\frac {23}{9}},{\frac {272}{91}},{\frac {3253}{969}}, {\frac {6522}{1771}},{\frac {94477}{23751}}, {\frac {714436}{168113}},{\frac {69263329}{15380937}},\ldots$$
with values
$$2, 2.555555556, 2.989010989, 3.357069143, 3.682665161, \\ 3.977811461, 4.249736784, 4.503193076,\ldots $$
Running the program on $10^8$ trials will then match these values to about five digits decimal precision.
#include <stdlib.h> #include <stdio.h> #include <assert.h> #include <time.h> #include <string.h> int main(int argc, char **argv) { int n = 6 , j = 3, trials = 1000; if(argc >= 2){ n = atoi(argv[1]); } if(argc >= 3){ j = atoi(argv[2]); } if(argc >= 4){ trials = atoi(argv[3]); } assert(1 <= n); assert(2 <= j); assert(1 <= trials); srand48(time(NULL)); long long data = 0; for(int tind = 0; tind < trials; tind++){ int src[n*j]; for(int cind = 0; cind < n*j; cind++) src[cind] = cind/j; int done = 0; int steps = 0; int dist[n]; for(int cind = 0; cind < n; cind++) dist[cind] = 0; while(!done){ int cpidx = drand48() * (double)(n*j-steps); int coupon = src[cpidx]; for(int cind=cpidx; cind < n*j-steps-1; cind++) src[cind] = src[cind+1]; steps++; dist[coupon]++; if(dist[coupon] == 2) done = 1; } data += steps; } long double expt = (long double)data/(long double)trials; printf("[n = %d, j = %d, trials = %d]: %Le\n", n, j, trials, expt); exit(0); }
Solution 2:
Suppose you pick socks without replacement $k$ times, $2\leq k\leq n+1$, when you find the first match. When you pick socks the first time there are $N\equiv 2n$ ways of doing it.
If there is to be no match during second pick, then there are $N(N-2)$ ways of picking the second sock, because for each sock picked the first time there are $(N-2)$ socks that we could pick the second time without having a match.
If there is to be no match during third pick, then there are $N(N-2)(N-4)$ ways of picking the third sock. This is because after two picks without a match, there are $(N-2)$ socks left, and for there are to be no match in the third pick there are $(N-2)-2$ choices available.
If there is to be no match during fourth pick, then there are $N(N-2)(N-4)(N-6)$ ways of picking the second sock. This is because after three picks without a match, there are $(N-3)$ socks left, and for there are to be no match in the fourth pick there are $(N-3)-3$ choices available.
Since the first match occurs on $k$-th picking, there has been no match during the previous $(k-1)$ pickings. The number of ways for there to be no match during $(k-1)$ pickings is: \begin{align} &N(N-2)(N-4)...(N-2(k-2))\\ =&2^{k-1}n(n-1)(n-2)...(n-(k-2))\\ =&2^{k-1}\prod_{i=0}^{k-2}(n-i) \end{align} Since you have picked $(k-1)$ distinct socks so far, to get a match on $k$-th picking we must pick one of the $(k-1)$ socks. Therefore the total number of ways to get the first match on $k$-th pick is: $N(N-2)(N-4)...(N-2(k-1))(k-1)$. \begin{align} (k-1)2^{k-1}\prod_{i=0}^{k-2}(n-i) \end{align}
Since there are a total of $N(N-1)(N-2)...(N-(k-1))=\prod_{i=0}^{k-1}(2n-i)$ ways to select socks without replacement in $k$ picks the sought for probability is: \begin{align} \frac{(k-1)2^{k-1}\prod_{i=0}^{k-2}(n-i)}{\prod_{i=0}^{k-1}(2n-i)} \end{align} This formula correctly reproduces your results. Using Mathematica I get the $\lim_{n\to\infty}\frac{E[X_n]}{n}=0$.
Solution 3:
Here is an alternative solution.
Let's say the first match occurs on the $X$th draw. We would like to find $\Pr(X>k)$, i.e. the probability that there has been no match by draw $k$, for $k = 0, 1, 2, \dots ,n$. There are $\binom{2n}{k}$ possible selections of $k$ socks, all of which we assume are equally likely. If there is no match, we must have at most one sock of each of $k$ colors. There are $\binom{n}{k}$ ways to select the $k$ colors, and $2^k$ ways to select the socks of those colors, so $\binom{n}{k}\; 2^k$ ways in all. So $$\Pr(X>k) = \frac{\binom{n}{k} \; 2^k}{\binom{2n}{k}}$$
Hence $$E(X) = \sum_{k \ge 0}\Pr(X>k) = \sum_{k=0}^n \frac{\binom{n}{k} \; 2^k}{\binom{2n}{k}}$$