Bayes, two tests in a row

I came up with a standard Bayesian example as to point out my confusion.

There is an epidemic. A person has a probability $\frac{1}{100}$ to have the disease. The authorities decide to test the population, but the test is not completely reliable: the test generally gives $\frac{1}{110}$ people a positive result but given that you have the disease the probability of getting a positive result is $\frac{80}{100}$.

I am interested in what happens after a person takes another test, specifically how much more information we would gain.

Probability after one test

Let $D$ denote the event of having the disease, let $T$ denote event of a positive outcome of a test. If we are interested in finding $P(D|T)$ then we can just go and apply Bayes rule:

$$ P(D|T) = \frac{P(T|D)P(D)}{P(T)} = \frac{0.8 \times 0.01}{0.009} = 0.88 $$

This feels about right.

Probability after two tests

This is where I think I misunderstand Bayes rule somewhat. Let $TT$ denote the outcome of two positive tests. We are now interested in calculating;

$$ P(D|TT) = \frac{P(TT|D)P(D)}{P(TT)} $$

The prior $P(D)$ is still $\frac{1}{100}$. $P(TT|D)$ would now be $0.8 \times 0.8$ because the two test can be assumed to be independent.

But I seem to not know how to deal with $P(TT)$ ... it cannot be $\frac{1}{110} \times \frac{1}{110}$ because then;

$$ \frac{P(TT|D)P(D)}{P(TT)} = \frac{0.64 \times 0.01}{0.009^2} > 1 $$

What is the right approach to the two-test Bayesian case?


As an aside, I believe the proper value for $P(D|T)$ is exactly $.88 = \frac{8}{10}\frac{1}{100}\frac{110}{1}$

We have $P(T)$, the probability of the test showing a positive regardless of disease state as $\frac{1}{110}$. This has to be the conditional probability of a positive given diseased plus the conditional probability of a positive given disease-free. In other words: $$ \begin{align} P(T) &= P(T\cap D) + P(T\cap \neg D)\\ &= P(T|D)P(D) + P(T|\neg D)P(\neg D)\\ \frac{1}{110} &=\frac{8}{10}\frac{1}{100} + P(T|\neg D)\frac{99}{100}\\ P(T|\neg D) &=\frac{2}{1815} \end{align} $$

Next: $$ \begin{align} P(TT) &= P(TT|D)P(D) + P(TT|\neg D)P(\neg D)\\ &= \frac{64}{100}\frac{1}{100} + \frac{4}{3294225}\frac{99}{100}\\ &=\frac{21087}{3294225} = \frac{213}{33275} \approx 0.006401202 \end{align} $$ Now $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &= \frac{64}{100}\frac{1}{100}\frac{33275}{213}\\ &= \frac{5324}{5325} \approx 0.999812207 \end{align} $$

So, after two tests, we are really sure this person is diseased.

Update

In general, though, with Bayesian estimation, one can use the previous posterior as the current prior-- see slides 3 and 4. This will follow through as well here. Let $P(D^*)$ be the new prior (after one test). Now we live back in one test world, as one test after one test is the same as two tests after no tests. So $P(D^*)$ is $0.88$ from above. $P(T|D^*)$ remains the same as does $P(T|\neg D^*)$. So, all we need is: $$ \begin{align} P(TT) &= P(T|D^*)P(D^*) + P(T|\neg D^*)P(\neg D^*)\\ &= 0.8\cdot.88 + \frac{2}{1815}\cdot0.12\\ &= \frac{426}{605} \approx 0.704132231 \end{align} $$

Note that $P(TT)$ in the $D^*$ world is much greater than $P(TT)$ in the $D$ world. It stands to reason since $TT$ in $D^*$ is actually $T$ (one test) after already knowing a positive test. $TT$ in $D$ is a priori two tests knowing nothing. Now, as per before: $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &=\frac{8}{10}\frac{88}{100}\frac{605}{426}\\ &=\frac{5324}{5325} \approx 0.999812207 \end{align} $$


You compute $P(TT)$ the same way you computed $P(T)$ - using the Law of total probability: $$P(TT)=P(TT|D)P(D)+P(TT|\neg D)P(\neg D)=0.8^2\times 0.01 + P(T|\neg D)^2\times 0.99$$

Alas, I cannot quite figure out what $P(T|\neg D)$ in your problem statement is.


This is an interesting one. It seems like you can't carry the independence across the conditions. What it means is, if you tested positive, then the next test is also more likely to be positive (can you explain why?).

Thus, to find $P(TT)$, you need to condition first, like sds did in his answer.

To find $P(T|\neg D)$, we can use $P(T) = 1/110$ and $P(T | D) = 0.8$ Then, $$P(T, \neg D) = P(T) - P(TD) = P(T) - P(T|D)P(D) \approx 0.00909 - 0.008 $$ et cetera.

Also, it can be discussed whether test errors are truly independent from one test to another on the same person (since they might depend on certain chemicals in the body, they are likely not), but this discussion is beyond this simple problem.


What is the conditional probability $\Pr[T \mid \bar D]$; that is, the probability of obtaining a single false positive? This is $$\Pr[T \mid \bar D] = \frac{\Pr[T \cap \bar D]}{\Pr[\bar D]} = \frac{\Pr[T] - \Pr[T \cap D]}{\frac{99}{100}} = \frac{\frac{1}{110} - \frac{1}{100}\frac{8}{10}}{\frac{99}{100}} = \frac{2}{1815}.$$ Then the probability of two successive false positives is $$\Pr[T_1 \cap T_2 \mid \bar D] = \Pr[T \mid \bar D]^2.$$ Therefore the unconditional probability of two positive tests is $$\Pr[T_1 \cap T_2] = \Pr[T_1 \cap T_2 \mid D]\Pr[D] + \Pr[T_1 \cap T_2 \mid \bar D]\Pr[\bar D]$$ and the desired probability is $$\Pr[D \mid (T_1 \cap T_2)] = \frac{\Pr[T \mid D]^2 \Pr[D]}{\Pr[T \cap T_2]}.$$


I thought it would be worth adding to this answer the case where we take $n$ tests, all independent, and all coming back positive. Note here that I assume (reasonably) that $P(D) \neq 0$. Let $T_1, T_2, ..., T_n$ denote $n$ independent tests.

In this case, we first look for an expression for $P(T_1T_2...T_n)$: \begin{align*} P(T_1T_2...T_n) &= P(T_1T_2...T_n|D)P(D) + P(T_1T_2...T_n|\neg D)P(\neg D)\\ &= P(T|D)^nP(D) + P(T|\neg D)^nP(\neg D)\\ &= P(T|D)^nP(D) + (1-P(T|D))^n(1-P(D)) \end{align*}

Then, we can find our expression for $P(D|T_1T_2...T_n)$. \begin{align*} P(D|T_1T_2...T_n) &= \dfrac{P(T_1T_2...T_n)|D)P(D)}{P(T_1T_2...T_n)}\\ &= \dfrac{P(T|D)^nP(D)}{P(T|D)^nP(D) + (1-P(T|D))^n(1-P(D))} \end{align*}

That's the general expression. We might be interested in the limit of this expression. $$\lim_{n \to\infty} \dfrac{P(T|D)^nP(D)}{P(T|D)^nP(D) + (1-P(T|D))^n(1-P(D))}= \begin{cases} 1 & \text{if } P(T|D) > 0.5 \\ P(D) & \text{if } P(T|D) = 0.5 \\ 0 & \text{if } P(T|D) < 0.5 \end{cases}$$

The interpretation? If you have a good test (meaning that, if you have the disease, then the test reports positive more than half of the time), then several tests suggesting you have the disease means that you have it (converges to 1). If you have a bad test (which is wrong more than half of the time, provided you have the disease), then several tests suggesting you have the disease means you don't have it (converges to 0). If you have a test that is not useful (given that you have the disease, half of the time it reports that you have the disease and half of time it reports otherwise), then it just converges to the probability you have the disease (which is why it is useless).