Lately, I am doing an investigation on Stirling's formula and its applications. So I thought I could use it to prove that the mode of the Poisson model is approximately equal to the mean. Of course, you do that by considering the curve that is formed by connecting the points of the probabilities of occurrence and the different values of the discrete random variable. Then you differentiate the p.d.f. where for $x!$ you use Stirling's formula $x!\approx \sqrt{2\pi x}~x^xe^{-x}$. The result is $\lnλ-1/(2x)-\ln x$ whose roots cannot be found analytically, but by iterative methods we find that as λ is larger and larger, the mode~mean.

Problem is, I found the following paper online, which seems to be the solution from a Harvard's undergraduate problem set.

http://www.physics.harvard.edu/academics/undergrad/probweek/sol84.pdf

It reads "You can also show this by taking the derivative of eq. (2), with Stirling’s expression in place of the $x!$. Furthermore, you can show that $x = a[=λ ~\text{in my case}~]-1/2$ leads to a maximum $P(x)$ value of $P_\max\approx1/\sqrt{2\pi a}$."

Does this puzzle you as much as it puzzles me? My main concern is over the "=" sign: how does this hold? The derivative=0 equation cannot have such an exact solution. Furthermore at such x, how does $P(X=a-1/2)$ give $1/\sqrt{2\pi a}$?

Am I (and my professor) missing something rather obvious or is the solution wrong?

Discuss!

PS: This sort of question might have been asked before, but still, I am really curious that somebody reads the paper in the link above, so that I can figure out what's going on.


Solution 1:

To find the mode of the Poisson distribution, for $k > 0$, consider the ratio $$ \frac{P\{X = k\}}{P\{X = k-1\}} = \frac{e^{-\lambda}\frac{\lambda^k}{k!}}{e^{-\lambda}\frac{\lambda^{k-1}}{(k-1)!}} = \frac{\lambda}{k}$$ which is larger than $1$ for $k < \lambda$ and smaller than $1$ for $k > \lambda$.

  • If $\lambda < 1$, then $P\{X = 0\} > P\{X = 1\} > P\{X > 2\} \cdots$ and so the mode is $0$.

  • If $\lambda > 1$ is not an integer, then the mode is $\lfloor\lambda\rfloor$ since $P\{X = \lceil\lambda\rceil\} < P\{X = \lfloor\lambda\rfloor\}$.

  • If $\lambda$ is an integer $m$, then $P\{X = m\} = P\{X = m-1\}$ and so either $m$ or $m-1$ can be taken to be the mode.

In all cases, the mode and the mean differ by less than $1$. You do not need to use Stirling's approximation at all.

Solution 2:

The weight $w(n)$ of the Poisson distribution with positive parameter $\lambda$ at the integer $n\geqslant0$ is $w(n)=\mathrm e^{-\lambda}\lambda^n/n!$ hence $w(n+1)/w(n)=\lambda/(n+1)$. One sees that $w(n)\gt w(n-1)$ for every $n\lt\lambda$ while $w(n+1)\lt w(n)$ for every $n\gt\lambda-1$. Thus, the mode of the Poisson distribution with parameter $\lambda$ is the highest integer $n_\lambda$ such that $n\lt\lambda$.

The estimation of $w_\lambda=\max\limits_nw(n)$ when $\lambda\to\infty$ is direct through Stirling's equivalent since $\lambda-1\lt n_\lambda\lt \lambda$, and indeed yields $\lim\limits_{\lambda\to\infty}\sqrt{2\pi\lambda}\cdot w_\lambda=1$.

Edit: Extend the sequence $(w(n))_{n\geqslant0}$ to a function $W$ defined on $\mathbb R^+$ through the formula $W(x)=\mathrm e^{-\lambda}\lambda^x/\Gamma(x+1)$. Thus, $W(n)=w(n)$ for every nonnegative integer $n$, and the function $W$ is maximal at $x_\lambda$ such that $\log\lambda=\psi(x_\lambda+1)$, where $\psi$ denotes the polygamma function. The asymptotic expansion of $\psi$ at infinity is $\psi(z)=\log(z)-\frac1{2z}+o(\frac1z)$ hence $\mathrm e^{\psi(z)}=z-\frac12+o(1)$ and $\lambda=x_\lambda+\frac12+o(1)$, that is, $\lim\limits_{\lambda\to\infty}x_\lambda-\lambda=-\frac12$.

Solution 3:

If all you're trying to prove is that the mode of the Poisson distribution is approximately equal to the mean, then bringing in Stirling's formula is swatting a fly with a pile driver. You have $$ P(X=x) = \frac{\lambda^x e^{-\lambda}}{x!}. $$ The mean is $\lambda$. Now let us seek the mode.

Observe that $$ \frac{P(X=x+1)}{P(X=x)} = \frac{\lambda^{x+1}/(x+1)!}{\lambda^x/x!} = \frac{\lambda}{x+1}. $$ This is $\ge 1$ if $x\le\lambda-1$ and $\le1$ if $x\ge \lambda-1$. Thus $$P(X=x+1)\quad \left.\begin{cases} \ge \\ = \\ \le \end{cases}\right\}\quad P(X=x)$$ according as $$ x\quad \left.\begin{cases} \le \\ = \\ \ge \end{cases}\right\}\quad \lambda-1.$$ The mode is therefore the integer part of $\lambda-1$. Except when $\lambda$ is an integer, in which case two consecutive integers are both modes.

The linked item at Harvard was trying to do much more than just find the mode.

Later addendum: Although the mode of the distribution must be within the set that is the support of the distribution, which is $\{0,1,2,3,\ldots\}$, the linked paper seeks the value of $x$ that maximizes $\lambda^2 e^{-\lambda}/x!$ when non-integer values of $x$ are allowed. We could define $x!=\Gamma(x+1)$. However, the author uses Stirling's approximation, and that at least allows us to do something in closed form. Consider (at this point I'll call it $a$ instead of $\lambda$) $$ \frac{a^x e^{-a}}{x!} \approx \frac{a^x e^{-a}}{x^x e^{-x}\sqrt{2\pi x}}. $$ Since $e^{-a}$ and $\sqrt{2\pi}$ don't depend on $x$ we can lose those and look at $$ a^x e^x x^{-x} x^{-1/2} = \exp\left((x\log a) + x - x\log x - \frac12 \log x\right). $$ Since $\exp$ is an increasing function, we can seek the value of $x$ that maximizes the expression inside it, and that will be the value of $x$ that makes the derivative of that expression $0$: $$ \frac{d}{dx} \left( (x\log a) + x - x\log x - \frac12 \log x \right) = \log a - \log x -\frac{1}{2x} = \log\left(\frac a x\right) - \frac{1}{2a}\left(\frac a x\right) = 0. $$ I don't see a way to get this in closed form, unless you want to bring in Lambert's $W$. That being the case, I don't know why he didn't just use $x!=\Gamma(x+1)$. But I entered this command into Wolfram Alpha:

f(x) = log(6) - log(x) - 1/(2x); x from 4 to 6

Lo and behold: it crosses the $x$-axis very close to $5.5$.

So the paper is not very explicit, to say the least, about how this conclusion was arrived at.

Still later addendum: Now I've entered this command into Wolfram Alpha:

f(x) = 6^x*e^(-6) / Gamma(x+1); from 5.49 to 5.51

It looks as if the maximum is near $5.494$.