I've seen the formula most commonly derived as a continuum generalization of a binomial random variable with large $n$, small $p$ and finite $\lambda = np$ yielding

$$ \lim_{n \to \infty} \binom{n}{x} p^x(1-p)^{n-x} = e^{-\lambda}\frac{\lambda ^ x}{x!}$$

It follows, from this derivation, that $$ \lim_{n \to \infty } = (1-p)^{n-x} = e^{-\lambda}$$ yields the probability of failing infinitely many times when the success rate is $\lambda$.

However, from this approach, I could not grok the remaining term

$$\frac { \lambda ^ x } {x!} $$


Question

What insightful derivations (perhaps, from generalizations) of the Poisson random variable exist which leaves an intuition for each of the terms?


My Answer:

My answer, https://math.stackexchange.com/a/2727388/338817 comes from geometric approach to Gamma function intuition (https://math.stackexchange.com/a/1651961/338817) which I quote:

Note that $\frac{t^n}{n!}$ is the volume of the set $S_t=\{(t_1,t_2,\dots,t_n)\in\mathbb R^{n}\mid t_i\geq 0\text{ and } t_1+t_2+\cdots+t_n\leq t\}$


Solution 1:

Since you asked for an intuition, and there are many online derivations of the pdf of the Poisson distribution (e.g. here or here), which already follow a mathematically strict sequence, I'm giving it a shot at looking at it almost as a mnemonic construction.

So the pdf is

$$f_X(x=k)=\frac{\lambda^k\mathrm e^{-\lambda}}{k!}$$

What about thinking of the Poisson parameter $\lambda$ as somewhat reflecting the odds of an event happening in any time period. After all, it is a rate (events/time period), and hence, the higher the rate, the more likely it will be that a certain number of events takes place in a given time period. Further, you already mention how the pdf of the Poisson is derived from the binomial, allowing $n$ to go to infinity; and in the binomial distribution, the expectation is $np,$ equal to $\lambda$ in the Poisson: $p=\frac{\lambda}{n}.$

Notice, for instance, that in the derivation of the pdf of the Poisson, $\left(\frac{\lambda}{n}\right)^k$ is precisely introduced as the $p^k$ (the probability of $k$ successes) in the binomial pmf, $\binom{n}{k}p^k(1-p)^{n-k}.$ The denominator $n^k$ is later eliminated as we calculate the limit $n\to\infty,$ and indeed, $\lambda^k$ is "left over" from this initial probability formula.

Now, in the pdf you have the term raised to the $k$ power, i.e. $\lambda^k$, and it makes intuitive sense, because each occurrence is independent from the preceding and subsequent. So if we are calculating the probability of $k$ iid events happening in a time period, we shouldn't be surprised to end up with $\underbrace{\lambda\cdot\lambda \cdots\lambda}_k=\lambda^k$.

Since these events are indistinguishable from each other, it is not surprising either that we have to prevent over-counting by dividing by the number of permutations of these events, $k!.$ This, in fact is the exact role of the term in the combinations formula of $\binom{n}{k}=\frac{n!}{(n-k)!\color{blue}{k!}}.$

And for the term $e^{-\lambda}$ we could bring into play the inter-arrival time following an exponential distribution: as the rate $\lambda$ increases, the inter-arrival time decreases. We can think of this factor as decreasing the probability of a low $k$ number of events when the rate $\lambda$ factor is high.

Solution 2:

Suppose $k$ successes occur in an interval $[0, t)$ and let their times be given by the $k$-tuple $(t_1, \dots, t_k), t_i \leq t$.

The set of events where exactly $n$ successes occur can be measured as $$ \int_0^{t} \int_0^{t - x_1} \cdots \int_0^{ t - \sum_{i = 1}^{n-1} x_i } \int_0^{ t - \sum_{i = 1}^{n} x_i } dx_n dx_{n-1} dx_{n-2} \dots dx_2 dx_1 = \frac{ t^n } { n! }$$

Importantly, the size of the sample space of all events is measured by considering the size of all possible $k$-tuples, $\forall n \geq 0$:

$$ \sum_{k = 0}^{\infty} \frac{ t^k }{ k! } = e^t$$

Taking the ratio of these size of these sets yields the probability that $n$ events occur in the interval $[0, t)$.

$$\boxed{ P \{ X = n \} = e^{-t} \frac{ t^n }{ n! } }$$

Note:

More generally, the event rate can be made non-homogeneous with a scalar function $\lambda(t)$. When the rate is constant for all time, i.e, $\lambda(t) = \lambda$, we write

$$P(X = n) = e^{-\lambda t}\frac{ (\lambda t)^n } { n! }$$

Letting $t = 1$ gives the process on a unit time interval, scaled by $\lambda$. Although we're really interested in $[0, 1)$, it's really as if we're looking at the interval $[0, \lambda)$.

Solution 3:

You're basically there with your limit. The ratio of two factorials is the product of $x$ numbers from $n-x+1$ to $n$, so for $n\gg x$ the result is approximately $n^x$.