Probability that a person is infected if test is positive?
I have the following problem:
$0.5$% of a population are infected with a dangerous virus. A diagnostic test for the identification of the virus is positive in $99$% for infected people and in $2$% for not infected people.
Please estimate the probability that a person whose test was positive is infected with the virus.
And this is my solution.
Since $0.5$% of the population is infected, then $99.5$% is not infected. Since $99$% of the tests on infected people is positive, then $1$% is negative. Similarly, we can conclude that $98$% of the tests in not infected people is negative.
What we need to find, probability that a person whose test was positive is infected with the virus or in other words that a person is infected given that the test was positive, can be expressed (using the Bayes' theorem) as $$p( P_I \mid T_P) = \frac{p(P_I)\cdot p(T_P \mid P_I)}{p(T_P)}$$
Where $P_I$ means person is infected and $T_P$ means tests are positive.
Now, we know $p(P_I)$, that is $\frac{0.5}{100} = 0.005$, and we also know $p(T_P \mid P_I) = \frac{99}{100} = 0.99$. We only need to find $p(T_P)$, that is the probability that tests are positive. For this purpose, we can use the law of the total probability in the following way:
$$p(T_P) = p(P_I)\cdot p(T_P \mid P_I) + p(\overline{P_I}) \cdot p(T_P \mid \overline{P_I}) = 0.005 \cdot 0.99 + 0.995\cdot 0.02 = 0.02485$$
We can now plug the numbers in the first equation
$$p( P_I \mid T_P) = \frac{0.005 \cdot 0.99}{0.02485} = 0.19919517102615694$$
That is the probability that person is infected given that the tests are positive is roughly $20$%.
Is my solution correct?
This $20$% does not convince me honestly...
Solution 1:
Yes, about 20% is the correct answer.
One way to check this is to work out the expected fractions of the total population that are:
- infected and test positive: 0.5% × 99% = 0.495%,
- infected and test negative: 0.5% × 1% = 0.005%,
- not infected and test positive: 99.5% × 2% = 1.98%, and
- not infected and test negative: 99.5% × 98% = 97.52%.
Thus, the fraction of the total population that test positive is 0.495% + 1.98% = 2.475%. Yet clearly, out of that approx. 2.5%, only about one fifth (i.e. approx. 0.5% of the total population) are actually infected.
One trick that can sometimes help to make sense of problems like this is to convert the fractions to actual numbers of individuals. So let's assume that our total population consists of 20,000 people (which is just large enough to make all the fractions work out to a whole number of people). Then:
- 100 out of these 20,000 people (0.5%) are infected.
- 99 of these 100 infected people (99%) test positive.
- 1 of these 100 infected people (1%) tests negative.
- 19,900 out of these 20,000 people (99.5%) are not infected.
- 398 of these 19,900 uninfected people (2%) test positive.
- 19,502 of these 19,900 uninfected people (98%) test negative.
Thus, the total number of people who test positive is 99 + 398 = 497. Out of these 497 people, 99 are actually infected, while 398 are false positives.
Yet another way to quickly figure out the approximate result is to note that almost all people (99.5% ≈ 100%) are uninfected, and almost all of the infected test positive (98% ≈ 100%).
Thus, the fraction of false positives in the full population is approximately equal to the given fraction of false positives among the uninfected (2% × 99.5% ≈ 2%), while the fraction of true positives is approximately equal to the fraction of infected (99% × 0.5% ≈ 0.5%). Thus, the rate of false positives in the population (≈ 2%) is about four times the rate of true positives (≈ 0.5%), and so only about one fifth of all positive test results are true.