Stopping rule for quality control problem
Solution 1:
Bayesian Approach
Samples $(q_i)_{i=1..k}$ are taken sequentially (without replacement) from a batch of size $n$ for the purpose of making inferences about $\bar{q}(n)$, the unknown proportion of "good" items in that batch. Two criteria are assumed:
- A real number $acc\in(0,1)$ (e.g. $acc=0.5$) that serves to define the "acceptability" of a batch; specifically, the batch is considered unacceptable iff $\bar{q}(n) < acc$.
- A real number $\epsilon\in(0,1)$ (e.g. $\epsilon=0.05$) that serves as an "effective convergence" criterion; specifically, if a sequence of probabilities $p_k$ is known to converge to either $0$ or $1$, it will be considered to have "effectively converged" to $0$ (resp. $1$) if $\ p_k \le \epsilon$ (resp. $\ p_k \ge 1-\epsilon$).
Sampling will therefore stop when $p_k$, the posterior probability of $\{\bar{q}(n)<acc\}$ given $(q_i)_{i=1..k}$, has "effectively converged" to $0$ or $1$ (as it must do for some $k\in 1..n$, because necessarily $p_n\in\{0,1\}$). Thus, the sequential procedure is as follows:
$k \leftarrow 0$.
$k \leftarrow k+1$; sample $q_k$, yielding $(q_i)_{i=1..k}$.
Compute the posterior distribution of the unknown $\bar{q}(n)$, given $(q_i)_{i=1..k}$.
Compute the posterior probability $p_k = P\big(\bar{q}(n)< acc\mid (q_i)_{i=1..k}\big)$.
If $p_k \le \epsilon$, decide $\{\bar{q}(n)\ge acc\}$; else if $p_k \ge 1-\epsilon$, decide $\{\bar{q}(n)< acc\}$; else goto (2).
Posterior distribution
Let $S_k = \sum_{i=1}^k q_i = k\,\bar{q}(k)$ for $k\in 0..n$. Now
$$\begin{align} &P\{S_n\!=\!t\mid S_k\!=\!s\} \\ &= \int_0^1 P\{S_n\!=\!t\mid S_k\!=\!s, p\!=\!p'\}\,f_{p\mid S_k}(p'\mid s)\,dp'\tag{A1}\\ &= \int_0^1 P\{S_n\!=\!t\mid S_k\!=\!s, p\!=\!p'\}\,\frac{P\{S_k\!=\!s\mid p\!=\!p'\}f(p')}{\int_0^1 P\{S_k\!=\!s\mid p\!=\!p''\}f(p'')dp''}\,dp'\tag{A2}\\ \end{align} $$
where
$$\begin{align} &P\{S_n\!=\!t\mid S_k\!=\!s, p\!=\!p'\}= \binom{n-k}{t-s}(p')^{t-s}(1-p')^{n-k-(t-s)}I_{\{s,...,s+n-k\}}(t)\tag{B1}\\ &P\{S_k\!=\!s\mid p\!=\!p'\}= \binom{k}{s}(p')^s(1-p')^{k-s}I_{\{0,...,k\}}(s)\tag{B2}\\ \end{align} $$ and $f()$ is the prior density for $p$. For the sake of modeling various potential prior states of knowledge about $p$, we use a Beta distribution with positive real parameters $\alpha, \beta$ (with $\alpha=\beta=1$ producing a uniform distribution): $$f(p') = \frac{1}{\mathrm{B}(\alpha,\beta)}(p')^{\alpha-1}(1-p')^{\beta-1}I_{(0,1)}(p')\tag{B3} $$ where $$\mathrm{B}(\alpha,\beta)= \int_0^1(p')^{\alpha-1}(1-p')^{\beta-1}\,\mathrm{d}p'=\dfrac{\Gamma(\alpha)\,\Gamma(\beta)}{\Gamma(\alpha+\beta)} . $$ Note that (B1) describes the fact that, conditional on $S_k\!=\!s,\ p\!=\!p'$, $$S_n\,=\,S_k + \sum_{i=k+1}^n q_i\ \sim\ s + \text{Binomial}(n-k,p').$$ and that (B2) describes the fact that, conditional on $p\!=\!p'$, $$S_k\,\sim\,\text{Binomial}(k,p').$$
Substituting (B) into (A) and performing the integrations gives the posterior distribution in terms of a binomial coefficient and standard Beta functions, for any $s\in 0..k$ and $k\in 0..n$:
$$\begin{align} &P\{S_n\!=\!t\mid S_k\!=\!s\} = \binom{n-k}{t-s}\ \frac{\mathrm{B}(\alpha+t,\ \beta+n-t)}{\mathrm{B}(\alpha+s,\ \beta+k-s)}\ I_{\{s,...,s+n-k\}}(t).\tag{C1}\\ \end{align} $$
Finally,
$$\begin{align} p_k &= P\big(\bar{q}(n)<acc\mid (q_i)_{i=1..k}\big)\\ &= \sum_{t\ <\ acc\cdot n}P\{S_n\!=\!t\mid S_k\!=\!s\}.\tag{C2} \end{align} $$
NB: When $k=0$, we have $P(S_n=t\mid S_0=0)$ and $p_0$, which are the prior (before-sampling) probabilities of $\{S_n=t\}$ and $\{\bar{q}(n)<acc\}$, respectively, deriving entirely from the $\text{Beta}(\alpha,\beta)$ prior assigned to the Bernoulli process parameter $p$. (Extremely "informative" priors may result in $p_0$ effectively equal to $0$ or $1$, hence accepting or rejecting the batch even before any sampling is done.)
Implementation in SageMath
# pr(S_n = t | S_k = s)
def P(t,n,k,s,a,b): return binomial(n-k,t-s) * beta(a+t,b+n-t) / beta(a+s,b+k-s)
# pr(S_n < acc*n | S_k = s)
def Pun(acc,n,k,s,a,b):
tot = 0; accn = acc*n
for t in [s..s+n-k]:
if t < accn: tot += P(t,n,k,s,a,b)
return tot
# example
n = 20; acc = 0.5
a = 1; b = 1
# display the posterior probabilities pr(S_n < acc*n | S_k = s)
for k in [0..n]:
print "k=%3d: " % k,
for s in [0..k]:
print "%.2f" % Pun(acc,n,k,s,a,b),
print
Example
Here's a picture of the decisions that would be made in the case of a non-informative (uniform) prior for $p$, batch size $n=20$, criteria $acc=0.5,\ \epsilon=0.05$ (computations programmed in Sage):
Laplace's "Rule of Succession"
Note that the present Beta-Binomial model exhibits Laplace's "Rule of Succession" as a special case of equation (C1) (changing $n\to n+1,\ k\to n$): $$P(S_{n+1}=s+1 \mid S_n=s) = \frac{s+\alpha}{n+\alpha+\beta}. $$