Excel quartile function doesn't work
The short explanation is that Excel is calculating the quartiles as percentiles. This is really quite different from the way we ordinarily think of quartiles (as medians of the upper/lower half of the data). Here's a quick explanation of how Excel does what it does, using your data as an example. I can't be 100% sure this is the exact algorithm Excel uses, but this will give the same results.
-
Excel assigns PERCENTILES to each value in the array.
P(4) = 0; P(6)=0.20; P(8)=0.40; ... ; P(16)=1 -
Excel then checks where the requested percentile falls in the array. For Q1, 0.25 falls between 6 and 8.
-
Excel then linearly interpolates between these values based on the percentile.
0.25 percentile is 0.05 percentile higher than 0.20 percentile.
0.05/(P(8)-P(6)) = 0.05/0.20 = 1/4
So, the 25th percentile is 1/4 of the way between 6 and 8. Thus, 6.5 is the returned value. (I realize you typed 5.5, but I checked your data in Excel, and 6.5 is the returned quartile. Likewise, 13.5 is returned for Q3 instead of 14.5.)
This of course is a strange way of calculating a quartile.
Now for finding a quartile the way you want to -- I have two suggestions.
-
Try the Statistics Package Add-in. I don't have it installed here on this computer, but it's worth a shot to see if it returns quartile values different than those returned by the worksheet function.
-
You can use a hacked-together stand-in formula. It's messy, but I think it captures what you're looking for.
For Q1, you can use:
=IF(ISEVEN(ROUNDDOWN(COUNT(A1:A8)/2,0)),AVERAGE(SMALL(A1:A8,ROUNDDOWN(COUNT(A1:A8)/2,0)/2),SMALL(A1:A8,ROUNDDOWN(COUNT(A1:A8)/2,0)/2+1)),SMALL(A1:A8,ROUNDUP(ROUNDDOWN(COUNT(A1:A8)/2,0)/2,0)))
For Q3, you can use:
=IF(ISEVEN(ROUNDDOWN(COUNT(A1:A8)/2,0)),AVERAGE(LARGE(A1:A8,ROUNDDOWN(COUNT(A1:A8)/2,0)/2),LARGE(A1:A8,ROUNDDOWN(COUNT(A1:A8)/2,0)/2+1)),LARGE(A1:A8,ROUNDUP(ROUNDDOWN(COUNT(A1:A8)/2,0)/2,0)))
Excel's built-in quartile function uses interpolation to calculate quartiles. Well, how does it find 5.5 and 14.5 in your example? Given your sample size (n) is 6, it calculates the first quantile as follows:
= (n + 1) / 4 = 7 / 4 = 1.75
As 1.75 falls between the values of 1 and 2, Excel interpolates the data to produce the result 5.5.
It calculates the third quantile as follows:
= 3 * (n + 1) / 4 = 21 / 4 = 5.25
As 5.25 falls between the values of 5 and 6, Excel interpolates the data to produce the result 14.5.
A simple macro can be written to achieve the results you want. Using the ROUND()
function for the above 1.75 and 5.25 values will generate Q1 and Q3 as 2nd and 5th elements of your data set, namely 6 and 14.
As to why Excel behaves so, there is no universal agreement on choosing the quartile values. Excel uses Method 2 whereas you use Method 1 in your example.
Excel 2010 introduced QUARTILE.INC and QUARTILE.EXC.
QUARTILE.INC is the same as Excel's old QUARTILE function and interpolates on an N-1 basis, while QUARTILE.EXC matches the function used in Minitab and some other statistics packages and interpolates on an N+1 basis.
Note that neither of these gives the values you expected. Interpolating on an N basis would do so, but you're probably thinking of the original Tukey method, which is one of several "Hinge" methods to determine quartiles.
If you want to read more, I wrote an extensive tutorial on calculating quartiles, Quartiles for Box Plots, with an emphasis on Excel usage. The Wikipedia article cited elsewhere in this thread is rather simplistic.