AP Statistics Lectures
Table of Contents
by Arnold Kling

Bayes' Theorem

We have just learned that conditional probability can be used to improve our prediction of events going forward. That is, knowing whether or not a runner is in scoring position helps us to predict more accurately whether or not Bernie Williams gets a hit.

Bayes' Theorem, published posthumously in the eighteenth century by Reverend Thomas Bayes, says that you can use conditional probability to make predictions in reverse! That is, if you know that Bernie Williams got a hit, you can "predict" the probability that he came up with a runner in scoring position.

Bayes' Theorem, sometimes called the Inverse Probability Law, is an example of what we call statistical inference. It is very powerful. In many situations, people make bad intuitive guesses about probabilities, when they could do much better if they understood Bayes' Theorem.

Recall that the definition of conditional probability is:
[1] P(B|A) = P(A and B)/P(A)
Bayes' Theorem is used to solve for the inverse conditional probability, P(A|B). By definition,
[2] P(A|B) = P(A and B)/P(B)
Solving [1] for P(A and B) and substituting into [2] gives Bayes' Theorem:
P(A|B) = [P(B|A)][P(A)]/P(B)

We can use Bayes' Theorem to find the conditional probability of event A given the conditional probability of event B and the unconditional probabilities of events A and B.

For example, we said that Bernie Williams is a .400 hitter with a runner in scoring position. In other words, P(B|A) = 0.4. We also said that the unconditional probability of Bernie Williams coming up with a runner in scoring position is 0.2, and that the unconditional probability of Bernie Williams getting a hit is 0.3.

Therefore, if you are given the information that Bernie Williams got a hit, you can infer something about the probability that there was a runner in scoring position. Using Bayes' Theorem,
P(A|B) = [P(B|A)][P(A)]/P(B) = [0.4][0.2]/[0.3] = .267

What this says is that when we are given the information that Bernie Williams got a hit, we should estimate the probability that he came up with a runner in scoring position as .267, which is higher than the unconditional probability of 0.2 that he will come up with a runner in scoring position.

Although the derivation for Bayes' theorem is straightforward, not everyone is comfortable with it. The difficult aspect to accept is that instead of using probability to predict the future, you are using it to make inferences about the past. People who think in terms of causality have trouble with this.

Everyone understands what it means to say, "Bernie Williams is batting with a runner in scoring position. He has a .400 chance of getting a hit." Can you interpret the statement, "Bernie Williams got a hit. Therefore, there is a .267 chance that there was a runner in scoring position"?

Here is a classic illustration of Bayes' Theorem. Suppose that you are given two drawers. You cannot see the contents of the drawers, but you are told that one drawer contains two gold coins and the other drawer contains one gold coin and one silver coin. If someone pulls a coin at random out of drawer A and it turns out to be gold, what is the probability that drawer A is the drawer with two gold coins?

Many people would say, "The chances are fifty-fifty that drawer A is the drawer with two gold coins." However, that is not the correct answer. Although there are many ways to get the correct answer, we will use Bayes' Theorem.

event description probability
A Drawer A has two gold coins 0.5
B Person chooses a gold coin out of the four coins 0.75
B|A Conditional probability of choosing a gold coin from A if it has two gold coins 1.0

Using Bayes' Theorem, we have
P(A|B) = [P(B|A)][P(A)]/P(B) = [1.0][0.5]/[0.75] = 2/3

What other lines of reasoning can lead you to the correct answer that when someone picks a gold coin out of a drawer chosen at random the chances are two out of three that the drawer contains two gold coins?

Very few doctors understand that a symptom is meaningless if as many as 10 percent of healthy patients have that symptom and the disease is relatively rare. Too bad they do not know Bayes' Theorem.

Here is another illustration of Bayes' Theorem. Suppose that you are diagnosed with microscopic hematuria (blood in the urine that is only visible under a microscope). This symptom occurs in 10 percent of all people and 100 percent of people with kidney cancer. You would like to know the probability that you have kidney cancer, which occurs in 0.0002 percent of all people. Remember that if we express a probability in percent, then we must multiply by .01 to get the probability as a fraction of one.

event description probability
A Someone has kidney cancer 0.000002
B Someone has microscopic hematuria 0.10
B|A Conditional probability of having hematuria given kidney cancer 1.0

Using Bayes' Theorem, we have
P(A|B) = [P(B|A)][P(A)]/P(B) = [1.0][0.000002]/[0.1] = .00002
That is, you still have a very low probability of kidney cancer. The reason is that the symptom of microscopic hematuria is relatively common in the healthy population. If it were true that only one hundredth of one percent of all people had microscopic hematuria, then microscopic hematuria would be a much more powerful indicator of kidney cancer. What would be the probability of kidney cancer if this were the case?