How do you estimate the mean of a Poisson distribution from data?

I'll start by commenting on your second approach. Since your observation is a Poisson process, then the time $\tau_1$ that you have to wait to observe the first car follows an exponential distribution $\tau_1\sim\mathrm{Exp}(\lambda)$, where $\lambda$ is the intensity of the Poisson process.

Since $\tau_1\sim\mathrm{Exp}(\lambda)$, then it indeed holds that

$$\mathbb{E}[\tau_1]=\frac{1}{\lambda}.$$

However, estimating $\lambda$ by $1/\tau_1$ leads to some problems since the estimator is not even unbiased. Indeed,

$$\mathbb{E}\left[\frac{1}{\tau_1}\right]=+\infty,$$

which does not conform to your intuition that $\mathbb{E}[1/\tau_1]=\lambda$.

Now, your second estimator is a more natural one which is known as the maximum likelihood estimator (MLE) in statistics. Your idea is to estimate $\lambda$ by

$$\widehat{\lambda}_1=\frac{N_t}{t},$$

where $N_t$ is the number of cars that you see in a time interval of length $t$. In this case,

$$\mathbb{E}[\widehat{\lambda}_1]=\frac{1}{t}\mathbb{E}[N_t]=\frac{1}{t}\lambda t=\lambda.$$

Lastly, note that your idea of doing many estimations and taking an average can also be applied in this case. You may count the number of cars that arrive each day in $t$ hours, and denote this number by $n_i$ for day $i$. Then, you may estimate $\lambda$ by

$$\widehat{\lambda}_2=\frac{1}{k}\sum_{i=1}^k\frac{n_i}{t},$$

and this estimator is indeed very natural.

Bounty section:

Let me just formalize slightly your answer. Assume that you start observing at $T_0$ and the cars arrive at the times $T_1<T_2<T_3<\cdots$. Denote by $\tau_i$ the time it takes to see the next car after car $i$ goes by (Note: as explained in the third section, this has the same distribution as the time you have to wait to see a car go by, starting at any time $t$). With these new notations, this means that $\tau_1=T_1-T_0$ and for $i>1$, $\tau_i=T_i-T_{i-1}$.

Since the arrival times $T_1<T_2<\cdots$ form a Poisson process of intensity $\lambda$, then the following properties hold:

$N_t\sim\mathrm{Poiss}(\lambda t)$, or in other words, the number of cars that arrive in an interval of length $t$ has a Poisson distribution of parameter $\lambda t$;
for any $i\in\mathbb N$, $\tau_i\sim\mathrm{Exp}(\lambda)$, i.e. the time between the arrival times of two cars is distributed as an exponential of parameter $\lambda$;
for any $i\in\mathbb N$, $T_i\sim\mathrm{Gamma}(n,\lambda)$, i.e. the arrival time of car number $i$ is distributed as a Gamma random variable of parameters $n$ and $\lambda$.

So in fact, $\sum_{i=1}^n\tau_i=T_n-T_0$ represents the time it takes for $n$ cars to go by, when starting the observation at a time $T_0$. Now, there are two points I'd like to make. First, note that the $\tau_i$ are independent samples from an exponential distribution. Thus, by the strong law of large numbers,

$$ \frac1n\sum_{i=1}^n\tau_i\xrightarrow[n\rightarrow+\infty]{}E[\tau_1]=\frac1\lambda. $$

Hence, since $\lambda>0$, your estimator tends almost surely to $\lambda$ as $n$ goes to infinity:

$$ \widehat\lambda_3=\frac{n}{\sum_{i=1}^n\tau_i}\xrightarrow[n\rightarrow+\infty]{}\lambda. $$

Second, since $T_n$ is a sum of $n$ independent exponential random variables, then $T_n\sim\mathrm{Gamma}(n,\lambda)$. That is, the probability density function of $T_n$ is given by

$$ f_n(x)=\frac{x^{n-1}}{\Gamma(n)}\lambda^ne^{-\lambda x}\mathbb 1_{(0,+\infty)}(x). $$

Hence, you may calculate the expectation of your estimator:

$$ \mathbb E\left[\widehat\lambda_3\right]=n\int_0^\infty\frac{x^{n-2}}{\Gamma(n)}\lambda^ne^{-\lambda x}\,\mathrm dx. $$

As seen previously, the integral diverges for $n=1$. For $n\ge2$ however, you can compute the integral as

$$ \mathbb E\left[\widehat\lambda_3\right]=n\frac{\lambda\Gamma(n-1)}{\Gamma(n)}\underbrace{\int_0^\infty\frac{x^{n-2}}{\Gamma(n-1)}\lambda^{n-1}e^{-\lambda x}\,\mathrm dx}_{=1}=\frac n{n-1}\lambda. $$

Therefore, it seems to be wiser to define

$$ \widehat\lambda_4=\frac{n-1}{\sum_{i=1}^n\tau_i}, $$

for $n\ge2$. This estimator will still converge to $\lambda$ almost surely, but will additionally be such that $\mathbb E\left[\widehat\lambda_4\right]=\lambda$.

In other words, $\widehat\lambda_4$ is consistent and unbiased.

Clarifying some points:

In your edit, you say "otherwise, if you you just stop your stop-watch and re-start it immediately, its just the same as the original MLE estimator I was asking about". This is not true. If you do this $n$ times, then you will wait a time that is distributed as a $\Gamma$ distribution, as mentioned previously. The difference is that for the original MLE estimator, you do this for a period of $t$ instead of counting $n$ cars. As you can see, both methods yield very different results.

You also mention that you want to stop your stop watch, and restart it at a later time instead of straight away.

This does not change anything since the exponential distributions are memoryless. Indeed, let us assume that you observe the first car, and stop your stop-watch. Then, you enable it at a later time $t$. Let's say that $T_i\le t<T_{i+1}$, i.e. you enable your stop-watch between car $i$ and $i+1$.

Well, you can in fact compute the distribution of $T_{i+1}-t$ (i.e. the time you wait until the next car) and it is $\mathrm{Exp}(\lambda)$. This is known as the inspection paradox and could be unintuitive at first sight. It is a result of the memoryless property of exponential random variables.

So to summarize, whenever you activate your stop-watch, the time waited $\tau_i$ will always be an exponential distribution of parameter $\lambda$. Thus, $\sum_{i=1}^n\tau_i$ is indeed a $\Gamma(n,\lambda)$ since the $\tau_i$ are independent.

How do you estimate the mean of a Poisson distribution from data?

Related

Recent Posts