More examples of Simpson's Paradox, barring the ones on Wikipedia, Titanic, and delayed flights.
Example: Effect of race on death-penalty sentences in Florida murder cases.
NB: This is adapted from Subsection 2.3.2 of A. Agresti (2002), Categorical Data Analysis, 2nd ed., Wiley, pp. 48-51.
In a 1991 study by Radelet and Pierce of the effect of race on death-penalty sentences, the following table was obtained tabulating the death-penalty sentences ($\text{Death}$) and non-death-penalty sentences ($\text{No death}$) in murder convictions in the state of Florida. $$ \begin{array}{lrrr} \text{Defendant's race} & \text{Death} & \text{No death} & \text{Percent death} \\ \hline \text{Caucasian} & 53 & 430 & 11.0 \\ \text{African-American} & 15 & 176 & 7.9 \end{array} $$
From this table, we see Caucasian defendants received the death penalty more often than African-American defendants.
Now, we consider the very same data, except that we stratify according to the race of the victim of the murder. Below is the table.
$$ \begin{array}{llrrr} \text{Victim's race} & \text{Defendant's race} & \text{Death} & \text{No death} & \text{Percent death} \\ \hline \text{Caucasian} & \text{Caucasian} & 53 & 414 & 11.3 \\ \text{Caucasian} & \text{African-American} & 11 & 37 & 22.9 \\ \text{African-American} & \text{Caucasian} & 0 & 16 & 0.0 \\ \text{African-American} & \text{African-American} & 4 & 139 & 2.8 \end{array} $$
Here we see that when considering the cases involving Caucasian victims separately from the cases involving African-American victims, that the African-American defendants are more likely than Caucasian ones to receive the death penalty in both instances (22.9% vs 11.3% in the first case and 2.8% vs. 0.0% in the second case).
Thus, this is a clear instance of Simpson's paradox.
(A similar previous study in 1981 by Radelet observed the same effect.)
Sailors in the U.S. Navy who went overboard at sea were found to be more likely to be rescued if they were not wearing life jackets than if they were. The explanation was that they wore life jackets in bad weather but not in good weather. In either good weather or bad, they were more likely to be rescued while wearing life jackets, but overall, they were more likely to be rescued while not wearing life jackets. The data are in an introductory text by Danny Kaplan, which I don't have before me.
Here's an artificial example. Imagine two major-league baseball players, Puckett and Smith. Puckett has 600 at-bats during the season and gets 200 hits, for a .333 season average. Smith gets called up to the majors in time for the last game of the season, has three at-bats, and gets three hits, for a season-average of 1.000. Thus Smith's batting average for the season is higher than Puckett's. The next year, Smith has 500 at-bats and gets 125 hits, for an average of .250. Puckett plays in the first game and the next morning gets hit by a truck while crossing the street, and can't play for the rest of the season. He gets no hits in the first game. So once again, Smith's average for the season is higher than Puckett's. Two years in a row, Smith's average was higher than Puckett's. But Puckett's average for the two seasons combined is higher than Smiths.
I edited this Quora example for grammar and readability.
When you compare a population with labeled subpopulations with another population (or "the same" at a different time), the two populations will extremely likely have different proportions of their subpopulations. This is the heart of Simpson's paradox. This is easiest to understand if you think about the change happening with one population over time.
Consider a very simple example. You've got girls and boys. The girls have, on average, longer hair than the boys do, and there is an average for the school somewhere in between. I.e. $\color{limegreen}{\text{average boys' hair's length (ABHL)} \leq \text{average student's hair length (ASHL)} \leq \text{average girls' hair's length (AGHL)}}.$
Now, a boy shows up with hair longer than ABHL, but shorter than ASHL. Presto: AGHL is the same. The boys have, on average, longer hair. Though no subgroup has shorter hair, ASHL has fallen!
This is how it goes with Simpson's paradox: the groups have averages that go in one direction, while the overall average goes in the other. Sometimes it's because members of the population leave or join, sometimes it's due to shifts in counts within the sub-groups. But it's always because the counts in subgroups differ between the two populations.
I simplify and correct some grammar of the text in Jon Wayland's Quora answer.
Several teachers were itching to know the optimal duration of study for students to score well on tests. So, they gathered the approximate number of hours students were studying, then compare to the students’ test scores.
Mr. Simpson convinced the faculty that more data means better results, and so all of the teachers integrated their cross-course data for the analysis.
The results were astounding. To everyone’s confusion, the less a student studied, the higher they tend to score on tests.
In fact, the coefficient associated with this correlation was strongly negative: $-0.7981$.
Should they be encouraging their students to study less? How in the world could data be backing such a claim? Surely something was missing.
The teachers decided to consult the school’s statistician, Ms. Paradox. After Mr. Simpson explained their results to her, she suggested they analyze each course’s data individually.
So, they then analyzed only Phys. Ed. and proceeded to have their minds blown.
A correlation of 0.6353! How in the statistical universe was this even possible?
Ms. Paradox then explained this as Simpson’s Paradox, a statistical phenomenon where a seemingly strong relationship reverses or disappears when introduced to a third confounding variable.
She convinced Mr. Simpson to plot all the data again, but then color-code each course separately to distinguish them.
After doing so, Mr. Simpson and his colleagues concluded that the relationship was indeed positive, and that the more hours a student studied, the higher the grade tends to be.
Including the course of study in the analysis completely reversed the relationship.
R Code for this example:
[I don't know how to post the R Code as he did, numbered and all. Can someone add it here? Thx!]