Why John Tukey set 1.5 IQR to detect outliers instead of 1 or 2?

By definition, 50% of all measurements are within $\pm0.5IQR$ of the median. Compare this - heuristically - with a normal distributions where 68% are within $\pm\sigma$, so in that case IQR would be slightly less than $\sigma$. Cutting at $\pm 1.5IQR$ is therefore somewhat comparable to cutting slightly below $\pm3\sigma$, which would declare about 1% of measurements outliers. This matches quite well with the habit of using "$3\sigma$" as a bound in many simple statistical tests. On the other hand, cutting at $\pm1IQR$ would be like cutting near $\pm 2\sigma$, making about 5% outliers - too many; and cutting at $\pm2IQR$ would be like cutting at $\pm4\sigma$, thus turning even many quite extreme measurements into non-outliers. So $\pm 1.5IQR$ is also what Goldilocks would choose.


The 3rd quartile (Q3) is positioned at .675 SD (std deviation, sigma) for a normal distribution. The IQR (Q3 - Q1) represents 2 x .675 SD = 1.35 SD. The outlier fence is determined by adding Q3 to 1.5 x IQR, i.e., .675 SD + 1.5 x 1.35 SD = 2.7 SD. This level would declare .7% of the measurements to be outliers.


We certainly CAN use whatever outlier bound we wish to use, but we will have to justify it eventually. In the not-so-recent past, it was typical to expect distributions to be Gaussian. With that assumption, ±1IQR is too exclusive, resulting in too MANY outliers, ±2IQR is too inclusive, resulting in too FEW outliers. ±1.5IQR is easy to remember, and is a reasonable compromise, under assumptions of Gaussianity.

However, for your distribution and expected outlier fraction, those assumptions may not be appropriate. Additionally, perhaps the definition of outlier is incorrect for your problem, and requires greater detail than just how it behaves within the bounds of a single metric?


As I recall, Prof. Michael Starbird, in one of his lectures in the recorded series, Joy of Thinking: The Beauty and Power of Classical Mathematical Ideas, answers this question. Dr. Starbird reports having attended the very conference presentation in which Tukey introduced this test, and during which Tukey himself was asked this very question. Tukey's answer: two seems like too much and one seems like not enough.