Discover to which batch a coin belongs

The following question is taken from a book, in a chapter on probabilty:

You have two batches of unbalanced coins. One has coins which turn up head with probability $p_1$, and the other has coins that turn up head with probability $p_2$. All you know about $p1$ and $p2$ is that $|p_1-p_2| \ge 0.01$. A coin was accidentaly dropped from one of the batches.

Design an experiment that will determine which batch the coin belongs to, with error probability of no more than $10^{-12}$. You should use a reasonable number of coin tosses.

As far as I can tell, any experiment will essentially boil down to something like this:

Take a coin from each batch and toss it $n$ times. Let $X$ and $Y$ be the sample means of the tosses of the coins from the first and second batch, respectively. Then toss the coin in question $n$ times, and let $Z$ be the sample mean of the tosses of that coin. Then find $n$ for which: $$ \mathbb{P}[\,|X-Z|<|Y-Z|\,] \le10^{-12} $$ when the coin in question is from the second batch. I tried to play around with the inquality $|X-Z|<|Y-Z|$ to get something to which I may apply the Chernoff bound. But the best I got required $n\approx 3\cdot10^6$, which does not seem reasonable at all. Any help will be appreciated.


I think the problem used "reasonable" to mean "somewhat reasonably close to optimal". I will give a more complete answer than what is probably expected in the problem, with an asymptotically optimal answer (up to a multiplicative error that approaches 1 in the limit) under different assumptions.

The optimal expected total number of coin tosses is approximately $8 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$ where $p = (p_1+p_2)/2$, $δ = |p_1-p_2| ≈ 0$, and $ε ≈ 0$ ($ε=10^{-12}$ in the problem); $ε$ is the permitted error probability. The formula is valid regardless of whether $p_1-p_2$ is known. The worst case permitted by the problem has $δ = δ_\min = 0.01$ and $p≈0.5$, leading to about 500000 coin tosses.

The multiplicative error in the above formula approaches 1 as $δ→0$ and $\frac{\log \log (1/δ)}{\log (1/ε)}→0$. This implies $ε→0$; also, $ε,δ_\min→0$ would suffice if the above formula used $δ_\min^2$ in place of $δ^2$. Note that $δ→0$ is the large numbers assumption. Also, $\lim_\limits{ε→0} \frac {(\mathrm{erf}^{-1}(1-2ε))^2} {\log(1/ε)} = 1$, but at $ε=10^{-12}$, $\log(1/ε)$ is about 10% larger, and it is unclear which one is more accurate above; $(\mathrm{erf}^{-1}(1-2ε))^2$ was chosen to align with the static case. Also, while per the statement of the problem, the experiment setup works with error $≤ε$ for every choice of $p_1$ and $p_2$, the above remains valid (as opposed to suboptimal) if we were given reasonable/typical priors on $(p_1,p_2)$ instead.

If the experiment setup is static (number of tosses of each type is set in advance), the required number of coin tosses is approximately 4 times higher; furthermore, since the number is not given, we would have to assume the worst case scenario, for a total of about 2000000 tosses. (Our static strategy works if we are given only the bounds on $δ$ and $≈p$, with the number of tosses based on the worst case scenario.)

It may also be interesting to work out what happens if
- $δ$ is not too small (small numbers correction),
- $\min (p_1,p_2,1-p_1,1-p_2)≈0$, and/or
- $ε$ is not very close to 0; already at $ε=10^{-12}$ we are at just 7 stdev.

Static Strategy

Assume that we do $N$ tossses, of which $Nx$ are the unknown coin, $Ny$ batch 1, and $Nz$ batch 2. The unique optimal choice is $x = 1/2$ and $y = z = 1/4$. Any $x$ different from 1/2 is substantially suboptimal.

If $x$ is 1/2 and $δ$ is known, then every choice of $y$ is approximately optimal, but if $y≠1/4$:
- $z = 0$ is a little inefficient because if batch 1 is correct, the error is two-sided (instead of one-sided).
- If $δ$ is different from expected, the error is substantially suboptimal. However, if $δ$ is bigger than expected, the error can still be $≤ε$.

To understand the optimality (at known $δ$), given measured $p_1$ and $p_2$, and assuming actual $p_2>p_1$, the likelihood function for $p$ is approximately $\mathcal{N}(μ=p_1 + δ/2 + (p_2-p_1-δ) \frac {z} {y+z}, σ^2 = \frac {p(1-p)} {N(y+z)})$, while the measurement of the unknown coin has $σ^2 = \frac {p(1-p)} {Nx}$. The estimated $p$ (i.e. the μ above) forms the decision boundary (for classifying the unknown coin), leading to $σ^2 = \frac {p(1-p)} {N(y+z)} + \frac {p(1-p)} {Nx}$ for the difference, with a one-sided error of $δ/2$ leading to wrong decision. If $y≠z$, there is also a second decision boundary, and if $\min(y,z)$ is too small (or $ε$ is very big), the error in determining whether $p_2>p_1$ is nonnegligible, but when all is computed, the effect on the uncertainty is only to make the error partially two-sided.

Thus, the optimal error is one-sided Gaussian error at $\frac {δ/2} {\sqrt{4p(1-p)/N}}$ standard deviations. If the error is $ε$, $N = 32 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$.

Dynamic Strategy; known $δ$

As is common at $ε≈0$, choosing a stopping point dynamically reduces the expected number of tosses in about 4 times. Intuitively, we can wait until the statistics is reasonably "typical" (this is inefficient if $ε$ is too large), so the required error is twice as large: The error has to be one side to another side rather than one side to the middle.

Here, one side corresponds to typical statistics if the unknown coin is in batch 1, and the other — batch 2. In the static case, errors happen primarily if the unknown coin appears in the middle between the two choices, but with a dynamic stopping point, we can wait this out.

Similarly to the static strategy (above), the optimal $x$ is 1/2, while the choice of $y$ and $z$ is flexible.

Dynamic Strategy; unknown $δ$

If the sequence of coin tosses is not adaptive (with only the stopping point chosen dynamically), then we pay a price because our measurement of $δ$ is imprecise. Assuming batch 1 and batch 2 are indistinguishable before the experiment, the optimal strategy is to split the tosses evenly, with the unknown coin, batch 1, and batch 2 using 1/3 of the tosses each; the expected number of tosses will be about 3/2 of the optimal value. The strategy is different and the penalty is less if we have an informative prior on whether the coin comes from batch 1 or batch 2.

However, using an adaptive strategy, we can reach the optimal value by:
- about half the time, tossing the unknown coin
- about half the time, tossing the batch that appears different from the unknown coin
- a small fraction of the time, tossing the batch that appears the same as the unknown coin.

At $ε≈0$, this is possible by choosing the batch that appears more different from the unknown coin except when the number of tosses for the other batch is much smaller. The stopping point for the experiment can be based on the confidences on the equality of the means (using unknown vs batch1 test and unknown vs batch2 test). The data also determine whether $p_2 > p_1$ with a similar accuracy.

One caveat is that with $δ$ unknown, there is an equivalent of $O(\log (1/δ))$ independent possible stopping points, but we can keep the error $O(ε)$ by requiring confidence $1-O(ε / \log (1/δ))$ to stop. We do not know $δ$, but we can use confidence $1-O(ε / \log^{1.001}(1/δ_\mathrm{guess}))$ instead where $δ_\mathrm{guess}$ is the current guess; 1.001 can be any constant >1. If we also have $δ_\min$ (in the problem, $δ_\min = 0.01$), we can replace $\log^{1.001}(1/δ_\mathrm{guess})$ with $\min(\log^{1.001}(1/δ_\mathrm{guess}), \log(1/δ_\min), \log^{1.001}(δ_\mathrm{guess}/δ_\min + 1))$. As long as confidence $1-ε^{1+o(1)}$ suffices, the formula at the top of the answer remains asymptotically valid; otherwise a correction is to use the required confidence in place of $1-ε$.


Let's identify head with $1$, tail with $0$.

At every step $i=1,...,n$, toss a coin from the first batch $X_i$, a coin from the second batch $Y_i$, and from the the lost coin $Z_i$. Set \[ S_n = \frac 1 n \sum_{i=1}^n (X_i-Z_i), \qquad T_n=\frac 1 n \sum_{i=1}^n (Y_i-Z_i). \] $E[S_n]=0$ and $|E[T_n]|\ge 0.01$ if the coin belongs to the first batch, $E[T_n]=0$ and $|E[S_n]|\ge 0.01$ if it belongs to the second batch. By Hoeffding inequality (https://en.wikipedia.org/wiki/Hoeffding%27s_inequality#General_case_of_bounded_random_variables), both $S_n$ and $T_n$ are sums of random variables in $[-1,1]$ hence \[ P(|S_n - E[S_n]| > t n^{-1/2}) \le 2 e^{- t^2 / 2},\qquad P(|T_n - E[T_n]| > t n^{-1/2}) \le 2 e^{- t^2 / 2}. \] Choose $t$ such that $10^{-12} = 4e^{-t^2/2}$, then set $n$ so that $t n^{-1/2} = 0.005$, then you can assert that the coin belongs to the first batch if $|S_n|\le 0.005$, the second batch otherwise. You are correct with probability at least $1-10^{-12}$.

This gives $t=\sqrt{24\ln(10) + 2 \ln(4)}$ and $\sqrt{n} = t / 0.005 \approx 1523$, $n\approx 2.3 \times 10^6$. Same answer as yours!


What you obtained is roughly optimal, up to a factor 10.

Let's say that you know $p_1=0.5$ and $p_2=0.5 + 0.01$. Then you draw $n$ times the coin with unknown mean, and you just want test whether its mean is $p_1$ or $p_2$.

Someone comes up with a solution, an estimator $\phi=\phi(X_1,...,X_n)$ that depends on the observed coin tosses $X_1,...,X_n$, and you want \[ \max\Big[P_0(\phi = 1),P_1(\phi=0)\Big]\le \times 10^{-12} \] where $P_0$ is the product measure of $n$ Bernoulli($p_0$) and $P_1$ the product measure of $n$ Bernoulli($p_1$). Using information theoretic inequalities such as Le Cam's or Fano's inequalities, we have \[ \max\Big[P_0(\phi = 1),P_1(\phi=0)\Big] \ge (1/4)\exp(-KL(P_0,P_1)) \] see for instance Theorem 2.2(iii) of the book "Introduction to nonparametric estimation" by Tsybakov. Here, $KL(P_0,P_1)$ denotes the Kullback-Leibler divergence between the two product measures, equal to \[n \Big(p_1\ln(p_1/p_2) + p_2 \ln(p_2/p_1) \Big)= n (p_2-p_1) \ln(p_2/p_1) = n\times 0.000198. \] Putting the pieces together, this shows that necessarily \[ n \ge [12 \ln(10) -\ln(4)]/0.000198 \approx 132549. \] Your answer is optimal up to a factor 10.