Does exceptionalism persist as sample size gets large?
Which of the following is more surprising?
- In a group of 100 people, the tallest person is one inch taller than the second tallest person.
- In a group of one billion people, the tallest person is one inch taller than the second tallest person.
Put more precisely, suppose we have a normal distribution with given mean $\mu$ and standard deviation $\sigma$. If we sample from this distribution $N$ times, what is the expected difference between the largest and second largest values in our sample? In particular, does this expected difference go to zero as $N$ grows?
In another question, it is explained how to compute the distribution $MAX_N$ of the maximum, but I don't see how to extract an estimate for the expected value of the maximum from that answer. Though $E(MAX_N)-E(MAX_{N-1})$ isn't the number I'm looking for, it might be a good enough estimate to determine if the value goes to zero as $N$ gets large.
Solution 1:
The precise version of the question was answered in the affirmative in the paper "Extremes, Extreme Spacings, and Tail Lengths: An Investigation for Some Important Distributions," by Mudholkar, Chaubey, and Tian (Calcutta Statistical Association Bulletin 61, 2009, pp. 243-265). (Unfortunately, I haven't been able to find an online copy.)
Let $X_{i:n}$ denote the $i$th order statistic from a random sample of size $n$. Let $S_{n:n} = X_{n:n} - X_{n-1:n}$, the rightmost extreme spacing. The OP asks for $E[S_{n:n}]$ when sampling from a normal distribution.
The authors prove that, for an $N(0,1)$ distribution, $\sqrt{2 \log n}$ $S_{n:n}$ converges in distribution to $\log Z - \log Y$, where $f_{Z,Y}(z,y) = e^{-z}$ if $0 \leq y \leq z$ and $0$ otherwise.
Thus $S_{n:n} = O_p(1/\sqrt{\log n})$ and therefore converges in probability to $0$ as $n \to \infty$. So $\lim_{n \to \infty} E[S_{n:n}] = 0$ as well. Moreover, since $E[\log Z - \log Y] = 1$, $E[S_{n:n}] \sim \frac{1}{\sqrt{2 \log n}}$. (For another argument in favor of this last statement, see my previous answer to this question.)
In other words, (2) is more surprising.
Added: This, does, however, depend on the fact that the sampling is from the normal distribution. The authors classify the distribution of extreme spacings as ES short, if $S_{n:n}$ converges in probability to $0$ as $n \to \infty$; ES medium, if $S_{n:n}$ is bounded but non-zero in probability; and ES long, if $S_{n:n}$ diverges in probability. While the $N(0,1)$ distribution has ES short right tails, the authors show that the gamma family has ES medium right tails (see Shai Covo's answer for the special case of the exponential) and the Pareto family has ES long right tails.
Solution 2:
Revised answer.
A very accurate approximation for the case of the normal distribution can be found in this paper. Let $X_{1:n} \leq X_{2:n} \leq \cdots \leq X_{n:n}$ be the ordered statistics obtained from a random sample $X_1,X_2,\ldots,X_n$, where $X_i \sim {\rm Normal}(\mu,\sigma^2)$. According to Eq. (2), for $i \geq n/2$ and as $n \to \infty$, $$ X_{i:n} \approx \mu + \sigma \bigg[\sqrt {2\ln n} - \frac{{\ln (\ln n) + \ln (4\pi ) - 2W_{i:n} }}{{2\sqrt {2\ln n} }}\bigg], $$ where $W_{i:n}$ has the density $$ g_{i:n} (w) = \frac{1}{{(n - i)!}}\exp ( - (n - i + 1)w - \exp ( - w)), \;\; - \infty < w < \infty . $$ Thus, for example, $$ g_{n:n} (w) = \exp ( - w - \exp ( - w)), \;\; - \infty < w < \infty $$ and $$ g_{n-1:n} (w) = \exp ( - 2w - \exp ( - w)), \;\; - \infty < w < \infty . $$ According to Eqs. (3) and (4) of that paper, $$ {\rm E}[X_{n:n} ] \approx \mu + \sigma \bigg[\sqrt {2\ln n} - \frac{{\ln (\ln n) + \ln (4\pi ) - 2 \cdot 0.5772}}{{2\sqrt {2\ln n} }}\bigg] $$ and $$ {\rm Var}[X_{n:n} ] \approx \frac{{\sigma ^2 \cdot 1.64493}}{{2\ln n}}. $$
Some general facts, which are somewhat useful in our context. If $X_{1:n} \leq X_{2:n} \leq \cdots \leq X_{n:n}$ are the ordered statistics obtained from a random sample $X_1,X_2,\ldots,X_n$, where the $X_i$ have cdf $F$ and pdf $f$, then $$ {\rm E}[X_{i:n}] = \frac{{n!}}{{(i - 1)!(n - i)!}}\int_{ - \infty }^\infty {x [ F(x)] ^{i - 1} [ 1 - F(x)] ^{n - i} f(x)\,dx}. $$ By an exercise in a book on order statistics, $$ {\rm E}[X_{r + 1:n} - X_{r:n} ] = {n \choose r}\int_{ - \infty }^\infty {[F(x)]^r [1 - F(x)]^{n - r}\, dx} ,\;\; r = 1, \ldots ,n - 1. $$ Letting $r=n-1$ thus gives $$ {\rm E}[X_{n:n} - X_{n-1:n} ] = n \int_{ - \infty }^\infty {[F(x)]^{n-1} [1 - F(x)]\, dx}. $$ Applying this formula to the case of exponential with mean $\theta$ gives a constant difference: $$ {\rm E}[X_{n:n} - X_{n-1:n} ] = n\int_0^\infty {(1 - e^{ - x/\theta } )^{n - 1} e^{ - x/\theta } \, dx} = \theta. $$ Nevertheless, the corresponding pdf, $\theta ^{ - 1} e^{ - x/\theta } \mathbf{1}(x \geq 0)$, goes to zero much faster than, say, $1/x^2$ as $x \to \infty$. In fact, $X_{n:n} - X_{n-1:n}$ is exponentially distributed with mean $\theta$ (see also leonbloy's answer). Indeed, substituting the exponential cdf $F(x)=(1-e^{-x/\theta})\mathbf{1}(x \geq 0)$ and pdf $f(x)=\theta^{-1} e^{-x/\theta}\mathbf{1}(x \geq 0)$ into the general formula $$ f_{X_{n:n} - X_{n-1:n} } (w) = \frac{{n!}}{{(n - 2)!}}\int_{ - \infty }^\infty {[F(x)]^{n - 2} f(x)f(x + w)\,dx},\;\; 0 < w < \infty $$ for the density of $X_{n:n}-X_{n-1:n}$ (which is a special case of the formula for $X_{j:n}-X_{i:n}$, $1 \leq i < j \leq n$), gives $$ f_{X_{n:n} - X_{n-1:n} } (w) = \theta^{-1}e^{-w/\theta}, \;\; 0 < w < \infty, $$ that is, $X_{n:n} - X_{n-1:n}$ is exponential with mean $\theta$.
Solution 3:
A quick heuristic attempt at this: first, standard results on order statistics tell us that if we take $n$ samples from any distribution, with CDF $F$, the $k$th shortest person out of $n$ will typically have height around $F^{-1}(k/(n+1))$.
So fix $\mu = 0$ and $\sigma = 1$. Then we expect the height of the tallest out of $n-1$ people to be around $\Phi^{-1}(1-1/n)$, and the height of the second tallest to be around $\Phi^{-1}(1-2/n)$, where $\Phi$ is the standard normal CDF. The question, then, is what happens to $\Phi^{-1}(1-1/n)-\Phi^{-1}(1-2/n)$ as $n$ gets large.
Now, it's a standard estimate that for large $z$, $1-\Phi(z) \approx \phi(z)/z$, where $\phi(z) = e^{-z^2/2}/\sqrt{2\pi}$ is the standard normal PDF. So let $\epsilon = 1-\Phi(z)$; then we get $\epsilon \approx \phi(z)/z$. Inverting gives the approximation
$$\Phi^{-1}(1-\epsilon) \approx W\left( {1 \over 2\epsilon^2 \pi} \right)^{1/2}$$,
where $W$ is the Lambert $W$ function, the inverse of $x \rightarrow xe^x$. In particular, if $\epsilon = 1/n$, then we have
$$\Phi^{-1}(1-1/n) \approx W \left( {n^2 \over 2 \pi} \right)^{1/2}$$.
So finally the question becomes, what happens to
$$ W\left( {4n^2 \over 2 \pi} \right)^{1/2} - W\left( {n^2 \over 2\pi} \right)^{1/2} $$
as $n$ gets large? It appears that this goes to zero as $n$ gets large; that is, smaller gaps are expected between the smallest and second smallest entries in larger samples from the normal distribution. So (2) is more surprising.
That being said, I've thrown out a lot here, but I'm guessing that this captures the correct asymptotics.