Expectation of the maximum of i.i.d. geometric random variables

Given $n$ independent geometric random variables $X_n$, each with probability parameter $p$ (and thus expectation $E\left(X_n\right) = \frac{1}{p}$), what is $$E_n = E\left(\max_{i \in 1 .. n}X_n\right)$$

If we instead look at a continuous-time analogue, e.g. exponential random variables $Y_n$ with rate parameter $\lambda$, this is simple: $$E\left(\max_{i \in 1 .. n}Y_n\right) = \sum_{i=1}^n\frac{1}{i\lambda}$$

(I think this is right... that's the time for the first plus the time for the second plus ... plus the time for the last.)

However, I can't find something similarly nice for the discrete-time case.

What I have done is to construct a Markov chain modelling the number of the $X_n$ that haven't yet "hit". (i.e. at each time interval, perform a binomial trial on the number of $X_n$ remaining to see which "hit", and then move to the number that didn't "hit".) This gives $$E_n = 1 + \sum_{i=0}^n \left(\begin{matrix}n\\i\end{matrix}\right)p^{n-i}(1-p)^iE_i$$ which gives the correct answer, but is a nightmare of recursion to calculate. I'm hoping for something in a shorter form.

Solution 1:

First principle:

To deal with maxima $M$ of independent random variables, use as much as possible events of the form $[M\leqslant x]$.

Second principle:

To compute the expectation of a nonnegative random variable $Z$, use as much as possible the complementary cumulative distribution function $\mathrm P(Z\geqslant z)$.

In the discrete case, $\mathrm E(M)=\displaystyle\sum_{k\ge0}\mathrm P(M>k)$, the event $[M>k]$ is the complement of $[M\leqslant k]$, and the event $[M\leqslant k]$ is the intersection of the independent events $[X_i\leqslant k]$, each of probability $F_X(k)$. Hence, $$ \mathrm E(M)=\sum_{k\geqslant0}(1-\mathrm P(M\leqslant k))=\sum_{k\geqslant0}(1-\mathrm P(X\leqslant k)^n)=\sum_{k\geqslant0}(1-F_X(k)^n). $$ The continuous case is even simpler. For i.i.d. nonnegative $X_1, X_2, \ldots, X_n$, $$ \mathrm E(M)=\int_0^{+\infty}(1-F_X(t)^n) \, \mathrm{d}t. $$

Solution 2:

There is no nice, closed-form expression for the expected maximum of IID geometric random variables. However, the expected maximum of the corresponding IID exponential random variables turns out to be a very good approximation. More specifically, we have the hard bounds

$$\frac{1}{\lambda} H_n \leq E_n \leq 1 + \frac{1}{\lambda} H_n,$$ and the close approximation $$E_n \approx \frac{1}{2} + \frac{1}{\lambda} H_n,$$ where $H_n$ is the $n$th harmonic number $H_n = \sum_{k=1}^n \frac{1}{k}$, and $\lambda = -\log (1-p)$, the parameter for the corresponding exponential distribution.

Here's the derivation. Let $q = 1-p$. Use Did's expression with the fact that if $X$ is geometric with parameter $p$ then $P(X \leq k) = 1-q^k$ to get

$$E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n).$$

By viewing this infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

$$\int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E_n \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.$$

The analysis now comes down to understanding the behavior of the integral. With the variable switch $u = 1 - q^x$ we have

$$\int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du $$ $$= -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,$$ which is exactly the expression the OP has above for the expected maximum of $n$ corresponding IID exponential random variables, with $\lambda = - \log q$.

This proves the hard bounds, but what about the more precise approximation? The easiest way to see that is probably to use the Euler-Maclaurin summation formula for approximating a sum by an integral. Up to a first-order error term, it says exactly that

$$E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},$$ yielding the approximation $$E_n \approx -\frac{1}{\log q} H_n + \frac{1}{2},$$ with error term given by $$\int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.$$ One can verify that this is quite small unless $n$ is also small or $q$ is extreme.

All of these results, including a more rigorous justification of the approximation, the OP's recursive formula, and the additional expression $$E_n = \sum_{i=1}^n \binom{n}{i} (-1)^{i+1} \frac{1}{1-q^i},$$ are in Bennett Eisenberg's paper "On the expectation of the maximum of IID geometric random variables" (Statistics and Probability Letters 78 (2008) 135-143).

Solution 3:

$$\begin{align} P(\max Y_i=k)&=P(\max Y_i\leq k)-P(\max Y_i<k)\\\\&=F(k)^n-(F(k)-f(k))^n. \end{align}$$ Thus $$\begin{align} E(\max Y_i) &= \sum_{k=0}^{\infty} k\left[F(k)^n-(F(k)-f(k))^n\right] \\\\ &=\sum_{k=1}^{\infty}k\left[\left(1-(1-p)^k\right)^n-\left(1-(1-p)^{k-1}\right)^n\right]. \end{align}$$

Not a closed form though.

See also Order statistic for both continuous and discrete case. The formula for the continuous case appears in Shai Covo's post here.

Surjective endomorphisms of finitely generated modules are isomorphisms

How can I find the points at which two circles intersect?

Showing the inequality $|\alpha + \beta|^p \leq 2^{p-1}(|\alpha|^p + |\beta|^p)$

How does the existence of a limit imply that a function is uniformly continuous

Proving $\frac{\sin x}{x} =\left(1-\frac{x^2}{\pi^2}\right)\left(1-\frac{x^2}{2^2\pi^2}\right) \left(1-\frac{x^2}{3^2\pi^2}\right)\cdots$

Evaluating the nested radical $ \sqrt{1 + 2 \sqrt{1 + 3 \sqrt{1 + \cdots}}} $.

Prove that the union of countably many countable sets is countable.

What is the logic/rationale behind the vector cross product?

Examples and Counterexamples of Relations which Satisfy Certain Properties

Solving a Linear Congruence [duplicate]