Can kurtosis measure peakedness?

From Kurtosis definition:

The only data values (observed or observable) that contribute to kurtosis in any meaningful way are those outside the region of the peak; i.e., the outliers. Therefore kurtosis measures outliers only; it measures nothing about the "peak."

In the past it was believed that it measured also the peak of the distribution, which has come to be false.


To elaborate on kubox's correct assertion that kurtosis measures nothing about the peak, it is also thought that perhaps kurtosis measures probability concentration inside the $\mu \pm \sigma$ range. One "definition" of kurtosis is that it is "vaguely ... the location- and scale-free movement of probability mass from the shoulders of a distribution into its center and tails." Here "shoulders" refer to values $\mu \pm \sigma$.

This interpretation suggests that longer tails correspond to more probability within the $\mu$ $\pm$ $\sigma$ range; and conversely, that more probability in the $\mu \pm \sigma$ range implies longer tails. Neither statement is mathematical; simple counterexamples to both of these statements are respectively given as counterexamples 1 and 2 below:

Counterexample 1: $X = \mu + \sigma Z$, where

$Z^2$ = $0.5^2$, with probability (wp) $.50$

= $1.2^2$, wp $0.50 - \theta$

= $0.155/\theta + 1.44$, wp $\theta$.

Take $\pm \sqrt{Z^2}$ with equal probability splits to get the actual $Z$s.

As $\theta \rightarrow 0$, the tail and the kurtosis tend to infinity, but there is always .5 probability within the $\mu \pm \sigma$ range.

Further, as the kurtosis tends to infinity in this family, the "peak" of the distribution becomes more flat-topped, since the probabilities on the four central points all converge to 0.25.

Counterexample 2: $X = \mu + \sigma Z$, where

$Z^2 = \theta$, wp $\theta$

= $2\theta$, wp $(1-\theta)/2$

= $2$, wp $(1-\theta)/2$.

Again, take $\pm \sqrt{Z^2}$ with equal probability splits to get the actual $Z$s.

As $\theta \rightarrow 1$, the probability within the $\mu \pm \sigma$ range tends to 1, but the tail length stays fixed at $\mu + \sqrt{2} \sigma$, and the kurtosis decreases to its minimum, 1.0.

Edit, 9/21/2018: Yet another incorrect myth about kurtosis is that higher kurtosis implies "more probability in the tails." Counterexample 1 above debunks that myth: For that family of distributions, as kurtosis increases, there is less probability in the tails.

Increases in kurtosis imply greater extremity of the tails, not higher probability in the tails. A mathematically precise justification of this statement is given as follows: for any sequence of distributions of random variables $Z$ (wlog having mean 0.0 and variance 1.0) having kurtosis tending to infinity, $E(Z^4 I(|Z| >b))/\text{kurtosis} \rightarrow 1.0$, for every real $b$.


Kurtosis

Kurtosis is the value...

$$E\left[{\left( \frac{X-\mu}{\sigma} \right)^4}\right]$$

...the average/expectation of the fourth power of the standardized variable.

It is a measure for the tendency of values to be spread far out over a large distance relative to the distance of one standard deviation.


View in terms of quantile function

The following graphic might illustrate it intuitively. In this graphic, we express the expectation/mean of a function $h(x)$ (for instance $h(x) = x^4$ if we compute the 4-th moment) as an integral over the quantiles (with $f(x)$ the density and $Q(p)$ the quantile function

$$E_{X}[h(x)] = \int_{-\infty}^\infty h(x) f(x) dx = \int_0^1 h(Q(p)) dp $$

Let's call this quantile function for the squared distance $R^2(p)$

In the graphic we have plotted the $R^2(p)$ quantile function the standard normal distribution (this actually matches the quantile function of a chi-squared distribution with 1 degree of freedom which is the distribution for the square of a standard normal distributed variable). This expresses the distribution of the squared distance from the mean.

Now, in this view:

  • The variance $\sigma^2$ is equal to the area under the curve.

    And if the variable $X$ is standardized, then the area should be equal to 1. In this image, we have stressed this by marking the area's above and below the line equal to 1. The area's of these two should be equal in order to have $\sigma^2 = 1$.

  • The 4-th moment (and the kurtosis if the curve is normalized) is equal to the integral of the square of this function $R^2(p)$

example


Kurtosis dependency on tail

So we see that the deviation of the kurtosis from 1 is much similar to the principle of the deviation between the mean squared and the squared mean. If the quantile function $R^2(p)$ varies a lot around it's mean $1 \sigma$, then you will have a larger kurtosis value.

The way in which the quantile function $R^2(p)$ varies will have a large influence. The kurtosis does not depend so much on the amount of the red area in the image, but more on whether this area is spread out over a range that includes large values (a few large values, which count stronger than many small values).

The image below shows the quantile function for a discrete variable with the mass function

$$f(x) = \begin{cases} \hphantom{-} 0.1/(a-1) & \text{if} & x = \pm \sqrt{a} \\ -0.1/(a-1)+0.4 & \text{if} & x = \pm 1 \\ \hphantom{-}0.2 & \text{if} & x = 0 \\ \hphantom{-}0 & \text{else} \end{cases}$$

kurtosis dependency on the tail

In this image you see that depending on the distribution of the values above $\sigma$ you can have a higher or lower kurtosis. The surface area needs to be the same, but you can spread it out over a few values with high value (large $a$) or over many values with a low value.

A high kurtosis means that the values above $1\sigma$ are spread out a lot.


Relationship with Westfall's theorem

In this post on the statistics site I learned about some interesting theorems.

Main Theorem: Let $Z_X = (X - \mu_X)/\sigma_X$ and let $\kappa(X) = E(Z_X^4)$ denote the kurtosis of $X$. Then for any distribution (discrete, continuous or mixed, which includes actual data via their discrete empirical distribution), $E\{Z_X^4 I(|Z_X| > 1)\}\le\kappa(X)\le E\{Z_X^4 I(|Z_X| > 1)\} +1$.

and

Refined Theorem: Assume $X$ is continuous and that the density of $Z_X^2$ is decreasing on [0,1]. Then the “+1” of the main theorem can be sharpened to “+0.5”.

We can quickly and intuitively get these theorems from the representation in terms of the quantile function. (and also we can improve the refined theorem)

In the below image we see how the computation of the kurtosis can be split up into two different parts. The values $<\sigma$ and the values $>\sigma$.

We see that the contribution to the kurtosis of the values $<\sigma$ is very low. At most it can be 1 which is the case when the green area fills nearly the entire area below the line $1 \sigma$.

With the refined theorem we assume that the density of $Z_X^2$ is decreasing on [0,1]. In that case the curve must be neccesarily below the diagonal that crosses with the point (0,0) and the point (x,1) where the quantile function equals 1. (in the image we have drawn this line in black).

So with the conditions of the refined theorem the largest possible contribution from the values below $1\sigma$ will be

$$\int_0^1 x^2 dx = 1/3$$

So, we could pose

New Refined Theorem: Assume $X$ is continuous and that the density of $Z_X^2$ is decreasing on [0,1]. Then the “+1” of the main theorem can be sharpened to “+$1/3$”.

theorem display


Wrap up

It is not a rule that higher kurtosis means peakedness.

A high kurtosis stems from two 'sources'

  • Many values below $1\sigma$ and many above $1\sigma$ (this does relate indirectly to peakedness; in order to have this discrepancy you need to have many values close to the mean)
  • The values above $1\sigma$ are spread out over a large range. Instead of a many values a little bit above $1\sigma$ we have a few values a lot above $1\sigma$

So kurtosis relates to a combination of two aspects. This is why in general (and in practice) we observe high peakedness and high kurtosis together, but it should not be regarded as a rule that high kurtosis means high peakedness or vice versa.


Peakedness

If kurtosis doesn't measure it, is there any statistic that can do the job? My Statistics textbook isn't clear about this part.

You could have a wide variety of measures, but broadly you could see peakedness as many values close to the mean (in the graphic a large green area). So the mass of the distribution within $\pm k \sigma$ might be a measure. And the value of $k$ will depend on the particular application (I actually can not think of any application for peakedness. I believe that people are more interested in the mass of the distribution outside $\pm k \sigma$ where $k$ is large)