Expectation of the maximum of i.i.d. geometric random variables

Given $n$ independent geometric random variables $X_n$, each with probability parameter $p$ (and thus expectation $E\left(X_n\right) = \frac{1}{p}$), what is $$E_n = E\left(\max_{i \in 1 .. n}X_n\right)$$


If we instead look at a continuous-time analogue, e.g. exponential random variables $Y_n$ with rate parameter $\lambda$, this is simple: $$E\left(\max_{i \in 1 .. n}Y_n\right) = \sum_{i=1}^n\frac{1}{i\lambda}$$

(I think this is right... that's the time for the first plus the time for the second plus ... plus the time for the last.)

However, I can't find something similarly nice for the discrete-time case.


What I have done is to construct a Markov chain modelling the number of the $X_n$ that haven't yet "hit". (i.e. at each time interval, perform a binomial trial on the number of $X_n$ remaining to see which "hit", and then move to the number that didn't "hit".) This gives $$E_n = 1 + \sum_{i=0}^n \left(\begin{matrix}n\\i\end{matrix}\right)p^{n-i}(1-p)^iE_i$$ which gives the correct answer, but is a nightmare of recursion to calculate. I'm hoping for something in a shorter form.


Solution 1:

First principle:

To deal with maxima $M$ of independent random variables, use as much as possible events of the form $[M\leqslant x]$.

Second principle:

To compute the expectation of a nonnegative random variable $Z$, use as much as possible the complementary cumulative distribution function $\mathrm P(Z\geqslant z)$.

In the discrete case, $\mathrm E(M)=\displaystyle\sum_{k\ge0}\mathrm P(M>k)$, the event $[M>k]$ is the complement of $[M\leqslant k]$, and the event $[M\leqslant k]$ is the intersection of the independent events $[X_i\leqslant k]$, each of probability $F_X(k)$. Hence, $$ \mathrm E(M)=\sum_{k\geqslant0}(1-\mathrm P(M\leqslant k))=\sum_{k\geqslant0}(1-\mathrm P(X\leqslant k)^n)=\sum_{k\geqslant0}(1-F_X(k)^n). $$ The continuous case is even simpler. For i.i.d. nonnegative $X_1, X_2, \ldots, X_n$, $$ \mathrm E(M)=\int_0^{+\infty}(1-F_X(t)^n) \, \mathrm{d}t. $$

Solution 2:

There is no nice, closed-form expression for the expected maximum of IID geometric random variables. However, the expected maximum of the corresponding IID exponential random variables turns out to be a very good approximation. More specifically, we have the hard bounds

$$\frac{1}{\lambda} H_n \leq E_n \leq 1 + \frac{1}{\lambda} H_n,$$ and the close approximation $$E_n \approx \frac{1}{2} + \frac{1}{\lambda} H_n,$$ where $H_n$ is the $n$th harmonic number $H_n = \sum_{k=1}^n \frac{1}{k}$, and $\lambda = -\log (1-p)$, the parameter for the corresponding exponential distribution.

Here's the derivation. Let $q = 1-p$. Use Did's expression with the fact that if $X$ is geometric with parameter $p$ then $P(X \leq k) = 1-q^k$ to get

$$E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n).$$

By viewing this infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

$$\int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E_n \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.$$

The analysis now comes down to understanding the behavior of the integral. With the variable switch $u = 1 - q^x$ we have

$$\int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du $$ $$= -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,$$ which is exactly the expression the OP has above for the expected maximum of $n$ corresponding IID exponential random variables, with $\lambda = - \log q$.

This proves the hard bounds, but what about the more precise approximation? The easiest way to see that is probably to use the Euler-Maclaurin summation formula for approximating a sum by an integral. Up to a first-order error term, it says exactly that

$$E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},$$ yielding the approximation $$E_n \approx -\frac{1}{\log q} H_n + \frac{1}{2},$$ with error term given by $$\int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.$$ One can verify that this is quite small unless $n$ is also small or $q$ is extreme.

All of these results, including a more rigorous justification of the approximation, the OP's recursive formula, and the additional expression $$E_n = \sum_{i=1}^n \binom{n}{i} (-1)^{i+1} \frac{1}{1-q^i},$$ are in Bennett Eisenberg's paper "On the expectation of the maximum of IID geometric random variables" (Statistics and Probability Letters 78 (2008) 135-143).

Solution 3:

$$\begin{align} P(\max Y_i=k)&=P(\max Y_i\leq k)-P(\max Y_i<k)\\\\&=F(k)^n-(F(k)-f(k))^n. \end{align}$$ Thus $$\begin{align} E(\max Y_i) &= \sum_{k=0}^{\infty} k\left[F(k)^n-(F(k)-f(k))^n\right] \\\\ &=\sum_{k=1}^{\infty}k\left[\left(1-(1-p)^k\right)^n-\left(1-(1-p)^{k-1}\right)^n\right]. \end{align}$$

Not a closed form though.

See also Order statistic for both continuous and discrete case. The formula for the continuous case appears in Shai Covo's post here.