Derivation of Mode of grouped data

Solution 1:

The following is not a rigorous derivation (a derivation would require a lot of assumptions about what makes one estimator better than another), but is an attempt to "make sense" of the formula so that you can more easily remember and use it.

Consider a bar graph with a bar for each of the classes of data. Then $f_1$ is the height of the bar of the modal class, $f_0$ is the height of the bar on the left of it, and $f_2$ is the height of the bar on the right of it.

The quantity $f_1 - f_0$ measures how far the modal class's bar "sticks up" above the bar on its left. The quantity $f_1 - f_2$ measures how far the modal class's bar "sticks up" above the bar on its right.

Now, observe that $$ \frac{f_1 - f_0}{2f_1 - f_0 - f_2} + \frac{f_1 - f_2}{2f_1 - f_0 - f_2} = \frac{f_1 - f_0}{(f_1 - f_0) + (f_1 - f_2)} + \frac{f_1 - f_2}{(f_1 - f_0) + (f_1 - f_2)} = 1 $$ So if we want to divide an interval of width $h$ into two pieces, where the ratio of sizes of those two pieces is $(f_1 - f_0) : (f_1 - f_2)$, the first piece will have width $\frac{f_1 - f_0}{2f_1 - f_0 - f_2} h$.

This is what the formula for estimating the mode does. It splits the width of the modal bar into two pieces whose ratio of widths is $(f_1 - f_0) : (f_1 - f_2)$, and it says the mode is at the line separating those two pieces, that is, at a distance $\frac{f_1 - f_0}{2f_1 - f_0 - f_2} h$ from the left edge of that bar, $l$.

If $f_1 - f_0 = f_1 - f_2,$ that is, the modal bar is equally far above the bars on both its left and right, then this formula estimates the mode right in the middle of the modal class: $$ l + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} h = l + \frac12 h. $$ But if height of the bar on the left is closer to the modal bar's height, then the estimated mode is to the left of the centerline of the modal class. In the extreme case where the bar on the left is exactly the height of the modal bar, and both are taller than the bar on the right, that is, when $f_1 - f_0 = 0$ but $f_1 - f_2 > 0$, the formula estimates the mode at $l$ exactly, that is, at the left edge of the modal bar. In the other extreme case, where the bar on the left is shorter but the bar on the right is the same height as the modal bar ($f_1 - f_0 > 0$ but $f_1 - f_2 = 0$), the formula estimates the mode at $l + h$, that is, at the right edge of the modal bar.

Solution 2:

we partition the continuous frequency distrbution into intervals. The maximum value is within the modal class. It is assumed that the rate of change of the frequency on both sides of the mode(max. frequency) are equal.

image

$$ \text{slope, }m_{AB}=-m_{BC}\\ \tan(90-b)=-\tan(90+b)\implies \tan a=\tan b\\ \frac{x}{f_1-f_0}=\frac{h-x}{f_1-f_2}\implies x(f_1-f_2)=h(f_1-f_0)-x(f_1-f_0)\\ x(2f_1-f_0-f_2)=h(f_1-f_0)\implies x=\frac{f_1-f_0}{2f_1-f_0-f_2}.h\\ \text{Mode}=l+x=l+\frac{f_1-f_0}{2f_1-f_0-f_2}.h $$