Why does standard usage of "percentile" vary from other _iles (quartiles, deciles, etc.)?
Solution 1:
The September 2018 OED definition of "percentile" continues past the point where you ended your quotation:
Statistics. A. n. Each of the 99 intermediate values of a variate which divide a frequency distribution into 100 groups each containing one per cent of the total population (so that, e.g., 40 per cent have values below the 40th percentile); each of the 100 groups so formed.
The usage of the word seems to be highly influenced by confusion between these senses.
The OED definitions of the other -ile words mention the same dual usages:
- decile:
- Statistics. Each of the ten groups produced by dividing a frequency distribution into groups containing one tenth of the total population. Also occasionally: each of the nine values of a variate which divide a frequency distribution in this way.
- quartile:
- Statistics. The first or third of the three values of a variate which divide a frequency distribution into four groups each containing one quarter of the total population (the second value of the three, the median, is sometimes also included). Also: each of the four groups so produced. Cf. interquartile adj., percentile n.
If I were writing in one text about percentiles and other -iles such as quartiles or quintiles, I would certainly prefer to use a consistent definition for all of them rather than using one type of definition for "percentile" and another for the rest.
However, unfortunately, there does seem to be an inconsistency of the type that you mention where some sources oddly use "99th percentile" to refer to the highest of 100 groups, but "10th decile" to refer to the highest of 10 groups, or "4th quartile" to refer to the highest of four groups. This is explicitly laid out for example in the following web page:
Percentile "ranks"
-scores of students are arranged in rank order from lowest to highest
-the scores are divided into 100 equally sized groups or bands
-the lowest score is "in the 1st percentile" (there is no 0 percentile rank)
-the highest score is "in the 99th percentile" (you can't score in the 100th percentile because you can't beat your own score)If you "scored in the 66th percentile", you scored "as well as or better than" 66% of the group. [...]
Deciles ("Deca" means "ten")
-scores of students are arranged in rank order from lowest to highest
-the scores are divided into 10 equally sized groups or bands (instead of 100 as with percentiles) (Or divide the percentiles into bands of 10 percentile ranks)
-the lowest score is "in the 1st decile"
-the highest score is "in the 10th decile"
http://www.behavioradvisor.com/701Percentiles.html
(Tom McIntyre at www.BehaviorAdvisor.com)
However common this is (it appears to be common, and references to "zeroth" or "100th" percentiles are rare), I have difficulty considering this to be "standard" usage of a statistical term as there's an obvious off-by-one problem with describing the first of 100 intervals as the "first percentile" and the hundredth of 100 intervals as the "99th percentile". For any straightforward and consistent definition of "percentile" as one of 100 consecutive intervals each containing one percent of the total population, where the highest interval is labeled the 99th percentile, you would need to say that the interval containing the lowest one percent of the population is either not a percentile, or that it is the zeroth percentile. Calling the lowest 1% the first percentile leaves you with no way to differentiate it from the group composed of the second-lowest one percent of the population (people who do better than 1% of the group).
I found a stats.stackexchange.com post about percentiles: https://stats.stackexchange.com/questions/430391/are-there-99-percentiles-or-100-percentiles-and-are-they-groups-of-numbers-or
A link to another useful article: "Three Meanings of “Percentile”", Dave Peterson, The Math Doctors
I think that what may have happened is that due to the common usage of percentile rank in discussion of test results, the phasing "in the nth percentile", where the "in" metaphor appears to be based on the "100 groups" definition of the word "percentile", came to be used to mean "having a percentile rank of n". The percentile rank, a way of specifying "what percentage of scores are less than the score of interest", is not based on the concept of dividing the population into 100 equal groups. It seems there may be different formulas used for percentile rank, but if calculated according to the formula in the linked Wikipedia article, it's not possible to have a percentile rank of zero or 100, but it is possible to have a percentile rank lower than one or higher than 99.
However, some sources I've seen refer to percentile ranks as integer values from 1-99 (e.g., https://www.centralriversaea.org/wp-content/uploads/2017/03/C3_percentile_rank_score-Revised-5.22.17.pdf and https://www.centralriversaea.org/wp-content/uploads/2017/03/C3_percentile_rank_score-Revised-5.22.17.pdf). I am not sure how percentile rank is defined in this system.