What is the intuition behind the exponential distribution?

Let me give a quick derivation of the exponential process just to sketch out the general idea. Formalizing the ideas below presumably leads to a derivation based on the Poisson process as in Clarinetist's answer.

Suppose we're interested in the occurrence of births on the planet Earth. Basically people are being born at random times all over planet constantly, independently of one another, and presumably the birth rate doesn't vary that much over a short time period of say, one week. Then the exponential distribution is the distribution of the time of the first birth after we begin observing.

Births can be replaced by any similar source of randomness - any situation in which we have a very large population of "timebombs" all set to go off at random times independently of one another. Popcorn kernels popping on a pan. Red cars driving past your window. Shooting stars. Atoms decaying in a sample of uranium. However, not just any "waiting time" can be modeled as an exponential distribution. It's very important that you have this large population of independent timebombs, because a characteristic property of the exponential distribution is memorylessness:

$$P(X<t+s\mid X > t) = P(X<s)$$

In other words, just because nobody's been born in the past minute doesn't make a birth more likely to occur in the next few seconds, a continuous version of the Gambler's Fallacy. But this is specifically because you have a large population of pregnant women whose due dates have absolutely nothing to do with eachother or with when you started observing. If you're talking about specifically your wife, the birth date is almost certainly not memoryless, because it will be concentrated around the due date. The birth is more likely to be a day after the due date than two days after, so every passing day increases the chance that the baby will be born immediately.

The time you will spend waiting for a bus, which I think is sometimes used as an example, also really isn't memoryless, because the bus drivers are trying to arrive at a time close to the one on the schedule. Thus every minute that passes after the scheduled time does in fact increase the odds of the bus arriving in the next minute.

But if we do have a large population of independent "timebombs", then we can take the following axioms:

  1. The probability of a timebomb going off in a given interval depends only on the length of that interval (here we're assuming that birth rate doesn't change over time).
  2. If $I$ and $J$ are disjoint intervals, then the events "at least one timebomb goes off in $I$" and "at least one timebomb goes off in $J$" are independent.

The first assumption justifies the existence of a probability $p$ that a timebomb will go off in any given interval of length $1$. It turns out that the other probabilities, the probabilities that a timebomb will go off in an interval of some arbitrary length $x$, are fixed once we've chosen $p$, which itself is a parameter depending on the "birth rate". For instance, if $p_n$ is the probability that at least one timebomb goes off in an interval of length $\frac 1 n$, then our two axioms imply that:

$$1 - p = (1-p_n)^n$$

Since we can chop up an interval of length $1$ into $n$ interval of length $\frac 1 n$, and the event that no timebomb goes off in the big interval is the intersection of the independent events that no timebomb goes off in any of the little intervals.

Now, let $T$ be the time of the first timebomb going off. To work out the distribution of $T$ is simply to compute $P(T<t)$ for any $t$, but that's just equal to the probability that at least one timebomb goes off in $[0, t]$! Well, split that interval into little intervals of length $\frac 1 n$. There will be roughly $nt$ of them. If we like we can be very very careful here, approximating $[0, t]$ with intervals of rational length, and then using continuity of probability measures or something, but let's just say there's roughly $nt$ of them and for large $n$ the probability of a timebomb going off in $[0, t]$ is pretty much:

$$P(T<t) \approx 1 - (1 - p_n)^{nt}=1 - ((1 - p_n)^n)^t=1 - (1-p)^t$$

Which is precisely the exponential distribution, although it's more traditional to write it as $P(T<t)=1-e^{-\ln(\frac1{1-p})t}$ and set $\lambda:=\ln\left(\frac1{1-p}\right)$.


This is from the webpage: http://www.milefoot.com/math/stat/pdfc-exponential.htm

Derivation of the Exponential distribution:

enter image description here


Some Background Information: We say that $N$ is a (Homogeneous) Poisson Process if it is a stochastic process [that is, a set of random variables varying by time] $\{N(t): t \geq 0\}$ satisfying:

  1. $N(0) = 0$
  2. $N(t) \in \mathbb{Z}_{\geq 0}$ for all $t \geq 0$.
  3. For $s \leq t$, $N(s) \leq N(t)$.
  4. Considering a set of times $\{t_1, t_2 \dots , t_n\}$ with $t_i < t_j$ for $i < j$, $N(t_i) - N(t_{i+1})$ are independent for all $i$.
  5. For $s \leq t$, $N(t) - N(s)$ has the same distribution as $N(t-s)$. In other words, the distribution depends only on the length of the interval.
  6. $N(t)$ follows a Poisson distribution with mean $\lambda t$ for all $t \geq 0$.

It follows immediately from 6 that for all $y \geq 0$ that $N(x+y) - N(x)$ follows a Poisson distribution with mean $\lambda y$.

Using this information, it can be shown that $T$, the time between events following a Poisson process, follows an exponential distribution with mean $\dfrac{1}{\lambda}$, or $$f_{Y}(y) = \lambda e^{-\lambda y}\text{, } y > 0\text{.}$$ Proof. Let us find the distribution of $T$ via the method of cumulative distribution functions. Suppose $T$ is the time between Poisson process events occurring at times $t_1 < t_2$. If $T > t_2 - t_1$, then no events will have occurred in the time interval $[t_1, t_2)$. Thus we have $N(t_2) - N(t_1) = 0$. Hence $$F_{T}(t_2 - t_1) = \mathbb{P}\{T \leq t_2 - t_1\} = 1 - \mathbb{P}\{T > t_2 - t_1\} = 1 - \mathbb{P}\{N(t_2) - N(t_1) = 0\}\text{.}$$ Because there is a $t > 0$ such that $t_2 = t_1 + t$, it follows that $N(t_2) - N(t_1) = N(t_1 + t) - N(t_1)$ follows a Poisson distribution with mean $\lambda t$. Hence $$\mathbb{P}\{N(t_2) - N(t_1) = 0\} = \dfrac{e^{-\lambda t}(\lambda t)^{0}}{0!} = e^{-\lambda t}$$ and notice, using this substitution, we have $t = t_2 - t_1$ so that $$F_{T}(t) = 1 - e^{-\lambda t}\text{, } t> 0$$ which is, indeed, the cumulative distribution function of an exponential distribution, showing that $T$ follows an exponential distribution with mean $\dfrac{1}{\lambda}$. $\square$

[I am new to this topic - so someone please correct me if I'm wrong anywhere.]