Describing Bayesian Probability

I'm a CS major doing some work with image recognition in which I use Bayesian probability. I have to give a presentation on my work, and while I have no problem describing the CS portion, I'm less sure about the math being done. I'm having issues trying to explain Bayesian probability as a whole (I won't need to present on the specific math being done, just Bayesian probability in general). The problem seems to be that since I don't fully understand Bayesian probability, it's hard for me to accurately describe it.

I'm in need of a thorough, yet concise, description of Bayesian probability. Any suggestions or descriptions would be greatly appreciated!

EDIT

Here's an example that I've come across (details may change) in almost everything I've read about Bayesian probability.

10% of people have a disease and the test to detect it is 92% accurate, and has a 5% false-positive rate. If you tested positive, what is the probability that you have the disease?

The probability that you have it is $\frac{\text{chance of testing positive and having it}}{\text{the chance of having it } + \text{ chance of there being a false positive}}$

Which, for this example, would be $\frac{(.1*.92)}{(.1*.92) + (.9*.05)} = .67$

I don't understand why this is considered Bayesian probability and not just traditional probability.


Solution 1:

Your edit seems to confuse and conflate two things - Bayes rule and the Bayesian interpretation of probability. The example you give is an application of Bayes rule to calculate an A-posteriori probability. Whether you give a problem to a Bayesian or a Frequentist, both should use Bayes rule the same way and give you the same answer. The difference is the way the answer is interpreted.

I'm giving a somewhat non-rigorous explanation of the difference in interpretation here. I hope this helps even though there is a lot more that can be said about this topic besides what I'm going to write.

According to the Frequentist interpretation, the A-posteriori probability of 0.67 implies that if you took a large population of people and tested them for the disease, roughly 67% of those who test positive will have the disease. According to this view, if you did the test on only one person, this particular person either has the disease or doesn't. Saying that there is a 67% chance of having the disease doesn't seem useful when applied to only one person.

According to the Bayesian interpretation, you don't need a large population for the probability to be meaningful. If the A-posteriori probability is 0.67, it implies that the test is not very good even though it is 92% accurate and has only a 5% false positive rate. In other words, if someone asked you if you could quantify your level of confidence in the test, you could reply that the A-posteriori probability is 0.67 which doesn't give you too much confidence. The closer the A-posteriori probability is to 1, the more confident you are in the results of the test. As a doctor, you would probably want to look at alternate tests before prescribing medicine for the patient.

Solution 2:

The short description of the Bayesian conception of probability is "reasoning with incomplete information".

This is in contrast to the short description of Frequentist probability as "calculation with long-run frequencies".

Both sides agree on the math, but Bayesians are willing to use it in more situations. In particular, they're willing to use such things as $P(M \vert D)$, the probability of a model given the data, even though there are not multiple possible realizations where different models can be true in different cases, which is what a Frequentist would require. For many cases this doesn't matter too much. "Models" can often be demoted to choices of random data from some suitably chosen sample space (which is just a larger model, really), by imagining a series of experimental realizations that could in principle be run, even though they aren't.

I personally find the language of Bayesian probability theory to be far easier to understand and work with, but there are many that prefer the other point of view. As touched on in the comment, one common concern is that in order to use Bayesian probability theory, one has to have a prior probabilities $P(M_a)$ in order to calculate new probabilities $P(M_a | D)$. How do you come up with these priors? Well, how do you come up with sample spaces in the Frequentist expression of the same problems? Symmetries are a common justification in either case, though so are taking certain mathematically convenient models such as "conjugate priors".