Statistics: Why doesn't the probability of an accurate medical test equal the probability of you having disease?

statistics

Suppose there is a test for Disease A that is correct 90% of the time. You had this test done, and it came out positive. I understand that the chance that this test is right is 90%, but I thought this would mean the chance that you have a disease should be 90% too. However, according to Bayes' rule, your chance of disease depends on the percentage of the population that has this disease. It sounds absurd: If the test is correct then you have it, and if it's not then you don't, 90% of the time- so there should be 90% chance that the results are right for you...

But on other hand, say 100% of population has it. Then regardless of the chance the test says you have it, let it be 90% or 30%, your chance is still 100%... now all of a sudden it doesn't sound absurd.

Please avoid using weird symbols as I'm not statistics expert. It just deludes things for me and buries the insight.

Look at it this way. There is a $10$% chance that any given instance of the test is wrong. It can be wrong in either of two ways: it can be positive when you don’t have the disease (a false positive), or it can be negative when you do have the disease (a false negative).

If the disease is very common, most of the people being tested will have the disease, so most of the errors will be made on those people and will therefore be false negatives. In that case few of the errors will be false positives, so if you test positive, you probably have the disease; a negative test, on the other hand, could easily be one of the false negatives. You already pointed out the extreme case of this, in which the entire population has the disease, and every error is a false negative.
If the disease is very rare, however, most of the people being tested will not have the disease, and most of the errors will therefore be false positives; if you test negative, you probably don’t have the disease, but a positive test could easily be one of the false positives. The extreme case would be when no one has the disease, and every error is a false positive.

As the incidence of the disease shifts from $100$% to $0$%, the probability that an error is a false positive increases from $0$ to $1$.

Your intuition is relying on the test always being accurate. However, this is not the case. There are four conditions that we have to account for:

The test is positive and you have the disease;
The test is positive and you don't have the disease;
The test is negative and you have the disease;
The test is negative and you don't have the disease.

If the test were perfect, only results #1 and #4 would happen. But this is not the case in the real world. As a consequence, we have to condition ourselves on the probability that the test makes a mistake--or, looking at it another way, we have to evaluate the results on the test based on the as-yet unknown reality as to whether we have the disease or not. A test has a given accuracy independent of whether the disease is present or not.

Put another way, suppose you notice a cookie is missing, and you ask a child. That child will either tell the truth, or not. Regardless of what the child tells you, there are two possible truths: the cookie is gone, or you miscounted your number of cookies.

Part of the difficulty in understanding this is that we automatically involve other aspects of the imagined scenario in our thinking. The Bayesian approach takes a simplistic view. Here's an example:

In a population of 1000 people, suppose 10 have a disease (ignore how we know that 10 people do....) Suppose we have a test which is correct 90% of the time it claims the disease is present, and also correct 90% of the time it claims the disease is absent. (These two numbers need not be the same, but let's suppose they are.)

Story #1: A new health initiative leads to the entire population being tested. Out of the 10 people with the disease, 9 are correctly identified by the test, while 1 is missed. Out of the 990 people without the disease, 891 (90%) are cleared as healthy, while 99 (10%) are mistakenly labeled as diseased.

Out of the 99 + 9 = 108 people who were tagged as having the disease, only 9 really do. (about 8%). So if we take one of the positive test results at random, that person who tested positive has only an 8% chance of having the disease.

Because the number of healthy people WHO WERE TESTED is so high, more false positives than true positives occurred.

Story #2: The test is expensive and rarely done. Only people who have symptoms suggesting the disease bother to have the test done. So out of our population of 1000, only 30 have the test done, including all 10 who really have the disease. Now, 9 of the 10 people with the disease get positive test results, and 2 of the 20 people WHO WERE TESTED but don't have the disease get positive test results. The chance of having the disease, given that you WERE TESTED and tested positive for the disease, is 9 / 11, or 82%.

The Bayesian analysis describes Story #1 above, but real life is usually more like Story #2. That contributes to the result seeming so counter to intuition.

To more directly address the OP's question: you had the test, and it came out positive. Your question is now "was that positive test result one of the correct positive results, or one of the false positives?" In a situation like Story #1 there are many more false positives than true positives, so you are likely to have gotten a false positive result.

Why did Euler use e to represent complex numbers?

Is there a formula to calculate the area of a trapezoid knowing the length of all its sides?

How is the Integral of $\int_a^bf(x)~dx=\int_a^bf(a+b-x)~dx$

Can I use Ravi Vakil's way of learning for elementary subjects?

Does set $\mathbb{R}^+$ include zero?

What is a good way to explain why the graph of polynomials do not exhibit ripples, even in an arbitrarily small interval?

"In a circle of 33, the next 10 people on my right are all liars" [closed]

Show that the three medians of a triangle are concurrent at a point

What problems have been frequently computationally verified for large values?

Why are equations written by equating something to zero?

What's with the crossed out trophy icon?

Why was everything unlocked the first time I started Portal 2 coop?

Statistics: Why doesn't the probability of an accurate medical test equal the probability of you having disease?

Related

Recent Posts