A probability question that I failed to answer in a job interview

There are $N$ different sized targets, numbered $1, 2,\dots, N$. A blindfolded shooter shoots towards random directions. Because target sizes are different, the hit probabilities of each target are different, say $p_1, p_2,\dots ,p_N$. A bullet hole is left on the target each time a target is hit. The shooter does not know whether a shot hit a target or not.

Note that a shoot could be "void", i.e., $p_1+p_2+\cdots+p_N\le1$. One shot at most hits one target.

The shooter keeps shooting until $X$ out of the $N$, $X<N$, targets have bullet holes. ( each of the $X$ targets is hit at least once). Then the shooter is told to stop.

The question is: at the end of the game, what is the probability that target $k$ has NOT been hit, $1\leq k\leq N$.


Solution 1:

We may as well suppose the game continues until all targets have been hit (which will happen eventually if all $p_j > 0$; we may as well remove any targets that have $p_j = 0$).

For each subset $S$ of $\{1,\ldots, N\}$, let $p_S = \sum_{s \in S} p_s$ be the probability that a shot hits a member of $S$, and let $a_{i,t}(S)$ be the probability that target $i$ is one of the first $t$ targets in set $S$ to be hit. You want $1 - a_{i,X}(\{1,\ldots,N\})$. Of course $a_{i,t}(S) = 0$ if $i \notin S$, and we also take it to be $0$ if $t = 0$. Otherwise, conditioning on the first target in $S$ to be hit, $$ a_{i,t}(S) = \dfrac{p_i}{p_S} + \sum_{j \in S \backslash \{i\}} \dfrac{p_j}{p_S} a_{i,t-1}(S \backslash \{j\}) $$ Now I claim that $$ a_{i,t}(S) = \sum_{T \subseteq S \backslash \{i\}} c(t,|T|,|S|) \frac{p_i}{p_{T \cup \{i\}}} $$ for some constants $c(t,m,n)$, $0 \le m \le n-1$. I will prove this by induction on $t$. In the case $t=1$ we have $a_{i,1}(S) = p_i/p_S$, so $c(1,m,n) = 1$ if $m = n-1$, $0$ otherwise.

If $t >1$, we have (with $|S|=n$): $$ \eqalign{a_{i,t}(S) &= \dfrac{p_i}{p_S} + \sum_{j \in S \backslash \{i\}} \dfrac{p_j}{p_S} \sum_{T \subseteq S \backslash \{i,j\}} c(t-1, |T|,n-1) \dfrac{p_i}{p_{T \cup \{i\}}}\cr &= \dfrac{p_i}{p_S} + \sum_{m=0}^{n-2} c(t-1,m,n-1) \sum_{T \subseteq S \backslash \{i\}: |T| = m} \sum_{j \in S \backslash (T \cup \{i\})} \dfrac{p_j p_i}{p_S p_{T \cup \{i\}}}\cr &= \dfrac{p_i}{p_S} + \sum_{m=0}^{n-2} c(t-1,m,n-1) \sum_{T \subseteq S \backslash \{i\}: |T| = m} \dfrac{p_{S \backslash (T \cup \{i\})} p_i}{p_S p_{T \cup \{i\}}}\cr &= \dfrac{p_i}{p_S} + \sum_{m=0}^{n-2} c(t-1,m,n-1) \sum_{T \subseteq S \backslash \{i\}: |T| = m} \dfrac{(p_S - p_{T \cup \{i\}}) p_i}{p_S p_{T \cup \{i\}}}\cr &= \dfrac{p_i}{p_S} + \sum_{m=0}^{n-2} c(t-1,m,n-1) \sum_{T \subseteq S \backslash \{i\}: |T| = m} \left(\dfrac{p_i}{p_{T \cup \{i\}}} - \dfrac{p_i}{p_S}\right)\cr &= \sum_{T \subseteq S \backslash \{i\}} c(t,|T|,n) \dfrac{p_i}{p_{T \cup \{i\}}} }$$ where $c(t, m,n) = c(t-1, m,n-1)$ if $m < n-1$ while $$c(t, n-1, n) = 1 - \sum_{m=0}^{n-2} {n-1 \choose m} c(t-1,m,n-1)$$

Hmm: it looks like $$ c(t, m, n) = \cases{1 & for $t=n,m=0$\cr (-1)^{n+m+t} {m-1 \choose {t+m-n}} & $ n-m \le t \le n$\cr 0 & otherwise\cr}$$ There ought to be an inclusion-exclusion proof for this.

Solution 2:

At the risk of shooting myself in the foot, here's how I'd approach it:

The key principle is that we can ignore any action that doesn't produce a tangible result. In particular, this includes missing a target, so we can start by normalizing the probabilities: write $\tilde{p_i} =\frac{p_i}{\sum_{1\leq k\leq N}p_k}$ so that we can work with values that have $\sum\tilde{p_i}=1$.

But from here, it should be clear that the process is just drawing without replacement - after our trial, we'll have some set of $X$ items, and the probability that those items are $i_0, i_1, \ldots, i_X$ is $\tilde{p}_{i_0}\cdot\tilde{p}_{i_1}\cdots\tilde{p}_{i_X}$. So the probability that target $k$ is hit is just $\dfrac{\sum_{\{S:\left|S\right|=X \wedge k\in S\}} \mathbb{P}_S}{\sum_{\{S: \left|S\right|=X\}} \mathbb{P}_S}$, where the lower sum is over all $X$-element subsets $S$ of $1\ldots N$ and the upper sum is over all those subsets with $k\in S$, and $\mathbb{P}_S=\prod_{i\in S}\tilde{p}_i$ (and of course, the probability that $k$ isn't hit is just the complement of this value).

EDIT: as pointed out by Nate Eldredge in a comment below, the formula can't simplify to one independent of the other probabilities; let the below serve as a cautionary tale about making assumptions!

This formula itself should offer further simplification, I'm reasonably sure, but that will take a little bit of chewing. (I strongly suspect that everything else is just a red herring and the final probability is a simple expression in $\tilde{p}_k$ and $\tilde{q}_k=1-\tilde{p}_k$, but I need to work through the details and have my ducks in a row before I'm certain of that.)