The probability that x birthdays lie within n days of each other
This is a question that has bugged me for quite some time: what is the chance that x people happen to have their birthdays within n days of each other?
A bit more specific, since this is how a colleague one phrased it: what is the probability that 5 people have their birthdays within 40 days?
Both the birthdays and the "distance" are supposed to be random: there is no fixed time span (e.g., April 1 to May 10) that the birthdays are to lie within. The birthdays should be such, that two birthdays are always within 40 days of each other.
The thing that bugs me, is that it seems to be some kind of recursive calculation, and that I can't find a way to put it into a straightforward mathematical formulation.
To explain that, consider 2 people: the first person is free to have his or her birthday $b_1$ any day of the year, and the second person has 81 days to pick a birthday date $b_2$ from (the 40 day timespan is inclusive, so up to 40 days before $b_1$, plus up to days after $b_1$, plus one on $b_1$ itself. This may be more logically phrased as 41 days for some; I don't know what is best, so please be clear about it in your answer).
Now, for the third person, the number of birthdays he or she can have, is limited by the second person's birthday: if $b_2 = b_1$, then $b_3$ can be among 81 days, but if $b_2 = b_1 + 1$ or $b_2 = b_1 - 1$, there are only 80 days for each option, and 79 for $\|b_1 - b_2\| = 2$, etc.
For the fourth person, the limitation is given by person 2 and 3, complicating things; the fifth person makes things even more complicated.
I've also tried to go the "exclusion" way (what is the chance that 5 people do not share their birthdays within 40 days of each other), but I didn't get anywhere that way.
But perhaps I'm going entirely the wrong way about this.
By now, I've computed it in various way, and I'm quite confident of the answer, but I'm still looking for the mathematical formulation of the general (x birthdays, n days) problem.
The answer I've got, btw, is $7.581428 \cdot 10^{-4}$, or $\frac{13456201}{365^4}$.
NB: this obviously assumes no leap years.
NB2: Extension of the Birthday Problem appears related, though I can't readily see if I can use any of that formulation here.
Solution 1:
Let $B_1,\dots,B_x$ be the birthdays of persons $1,\dots,x$ and consider for all $m \in [0,364]$ the event "all birthdays happen between days $m$ and $m+n$, and $m$ is one of the birthdays", that is $$ E_m = \{\forall i,\; B_i\in [m, m + n]\}\cap \{\exists i,\; B_i = m\}. $$ (interpret $m$ as some kind of "minimal" birthday)
If $n < 365/2$, the events $E_m$ are mutually exclusive: at most one of them can happen. On the other hand, the probability that all $x$ birthdays are contained in a block of $n+1$ consecutive days is the probability that at least one of these events $E_m$ happens. This probability is therefore $$ \sum_{m=0}^{364} \Pr(E_m) = \sum_{m=0}^{364} \left[\left(\frac{n+1}{365}\right)^x- \left(\frac{n}{365}\right)^x\right] = \frac{(n+1)^x - n^x}{365^{x-1}}. $$
Indeed, $\Pr(E_m)$ is obtained by a simple inclusion-exclusion counting: it is the probability that the birthdays are contained in the block $[m,m+n]$ of size $n+1$ but not all of them in the subblock $[m+1,m+n]$ of size $n$.
What can be said when $n \geq \frac{365}{2}$? The events $E_m$ are not mutually exclusive anymore so we should use the Inclusion–exclusion formula: $$ \sum_m \Pr(E_m) - \sum_{m_1 < m_2} \Pr(E_{m_1}\cap E_{m_2})+ \dots + (-1)^n \sum_{m_1 < \dots < m_n} \Pr(E_{m_1}\cap \dots\cap E_{m_n}). $$
Solution 2:
Consider the smallest interval containing all $x$ birthdays, and suppose its length is $m$ (with $2 \le m \le n$: we will account separately for the $m=1$ case). Suppose that $y$ of the birthdays (with $2\le y\le x$) are at endpoints of the interval, and the remaining $x-y$ birthdays are in the interior. Then there are ${{x}\choose{y}}$ ways to choose which $y$ people have their birthdays on the endpoints, for each of which there are $2^y-2$ ways to choose which endpoints hold those birthdays (omitting the two ways for which all $y$ are at a single endpoint). The probability that these $y$ birthdays are as specified is $\left(\frac{1}{365}\right)^y$, and the probability that the remaining $x-y$ birthdays are inside this interval is $\left(\frac{m-2}{365}\right)^{x-y}$. There are $365$ possible starting points for the interval. Summing over the possible values for $m$ and $y$, and adding the $m=1$ term (for which the probability is $365^{-(x-1)}$), gives the final result: $$ P(n,x)=\frac{1}{365^{x-1}}+365\sum_{m=2}^{n}\sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{1}{365}\right)^y{{x}\choose{y}}\left(2^y-2\right) \\ =\frac{1}{365^{x-1}} + 365\sum_{m=2}^{n}\left[\sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{2}{365}\right)^y{{x}\choose{y}} - 2 \sum_{y=2}^{x}\left(\frac{m-2}{365}\right)^{x-y} \left(\frac{1}{365}\right)^y{{x}\choose{y}}\right]. $$ The sums over $y$ can be simplified, since $$ \sum_{y=2}^{x}a^{x-y}b^{y}{{x}\choose{y}}=\sum_{y=0}^{x}a^{x-y}b^{y}{{x}\choose{y}}-a^{x-1}bx-a^x=(a+b)^x-a^{x-1}bx-a^x. $$ We find $$ P(n,x)=\frac{1}{365^{x-1}}\left(1+\sum_{m=2}^{n}\left[m^{x}-2\left(m-1\right)^{x}+\left(m-2\right)^{x}\right]\right) $$ after some algebra. The remaining sum telescopes, yielding the final, very elegant, result: $$ P(n,x)=\frac{n^x - (n-1)^x}{365^{x-1}}. $$ (This agrees with OP's answer after setting $n=41$ and $x=5$. The question is slightly ambiguous: the title says "within $n$ days of each other", but the first sentence says "within a timespan of $n$ days", which I interpreted to mean "within a block of $n$ consecutive days".)
Solution 3:
Here's is another elementary approach based on combinatorial enumeration.
Restriction: This is only a partial answer valid for $1\leq n\leq 183$. A complete answer is planned to follow.
We assume the year having $365$ days. We assume a situation with $x \geq 2$ people and ask for a birthday of them within an interval of length $1\leq n\leq 183$ days.
A starter: $n=2,x=3$
To motivate the approach I would like to give a small example, namely asking for the probability $P(n=2,x=3)$ of three people having their birthday within two consecutive days.
First we determine the number $T$ of all possible configurations. \begin{align*} T=365^x=365^3\tag{1} \end{align*}
Note: Please note, that if we agree that $(1)$ is correct, we observe that e.g. the tripel $(1,365,1)$ means, that the first person's birthday and the third person's birthday is the first day in the year. The second persons birthday is the last day in the year and the configurations $(1,365,1)$ and $(1,1,365)$ are different.
Now, we derive the number $X$ of proper configurations by simply enumerating them.
There are $X_1=365$ proper configurations of three people with birthday on exactly one day \begin{array}{lllr} (1,1,1)&\quad\dots\quad&(364,364,364),&(365,365,365)\tag{2}\\ \end{array} and there are $X_2=6\cdot365$ proper configurations of three people having their birthday on exactly two consecutive days \begin{array}{lllr} (1,1,2)&\quad\dots\quad&(364,364,365),&(365,365,1)\tag{3}\\ (1,2,1)&\quad\dots\quad&(364,365,364),&(365,1,365)\\ (2,1,1)&\quad\dots\quad&(365,364,364),&(1,365,365)\\ (1,2,2)&\quad\dots\quad&(364,365,365),&(365,1,1)\\ (2,1,2)&\quad\dots\quad&(365,364,365),&(1,365,1)\\ (2,2,1)&\quad\dots\quad&(365,365,364),&(1,1,365)\\ \end{array}
This gives a total of $X=X_1+X_2=7\cdot365$ proper configurations.
We conclude:
\begin{align*} P(2,3)=\frac{X}{T}=\frac{7}{365^2}\simeq5.3\cdot10^{-5} \end{align*}
Now we generalise this approach.
Generalisation: $P(n,x)$ for $x\geq 2$ and $1\leq n\leq 183$
First step: $n=1$
We start the easy way with $n=1$. If there are $x$ people and we ask for having their birthay within exactly one day, there are $365$ proper configurations corresponding to $(1)$, namely the $x$-tuples \begin{array}{lllr} \underbrace{(1,1,\dots,1)}_{x}&\quad\dots\quad&\underbrace{(364,364,\dots,364)}_{x},&\underbrace{(365,365,\dots,365)}_{x}\\ \end{array}
So we get for
$n=1$:
\begin{align*} X_1=365 \end{align*}
Second step: $2\leq n\leq 183$
Let's assume an interval of length $k$ with $2\leq k \leq n$ and let's ask for the number of proper configurations of people having their birthday within an interval of length exactly $k$.
So, we have to determine the number $Y_k$ of $x$-tuples which have values only within the interval and which also contain at least one left endpoint and at least one right endpoint of the interval. This number is \begin{align*} Y_k=k^x-2(k-1)^x+(k-2)^x\qquad(x \geq 2) \end{align*}
Observe that $k^x$ gives the number of all $x$-tuples with (the $k$ different) values from the interval. We subtract $2(k-1)^x$ which is the number of $x$-tuples having no left endpoint or having no right endpoint of the interval. Since we subtracted the number of $x$-tuples, which have neither left nor right endpoint twice, we have to add $(k-2)^x$ for compensation according to the Inclusion-Exclusion Principle.
There are $365$ different intervals of length $k$, namely:
\begin{align*} &[1,2,\dots,k]\\ &[2,3,\dots,k+1]\\ &\quad\dots\\ &[364,365]\cup[1,2,\dots,k-3,k-2]\\ &[365]\cup[1,2,\dots,k-2,k-1]\\ \end{align*}
And since we can vary $k$ between $2$ and $n$, we observe that $X_2$, the number of proper configurations of $x\geq 2$ people having their birthdays within $n$ days is after some simplifications for
$2 \leq n\leq 183$: \begin{align*} X_2&=365\sum_{k=2}^{n}Y_k\\ &=365\sum_{k=2}^{n}\left(k^x-2(k-1)^x+(k-2)^x\right)\\ &=365\left(n^x-(n-1)^{x}-1\right) \end{align*}
This gives a total of $X=X_1+X_2=365\left(n^x-(n-1)^x\right)$
We conclude:
\begin{align*} P(n,x)=\frac{X}{T}=\frac{n^x-(n-1)^x}{365^{x-1}}\qquad(1\leq n \leq 183), \quad (x\geq2) \end{align*}
in accordance with the answer of mjqxxxx if we additionally respect $n\leq 183$.
Note: Observe, that if we ask for the probability of two people having their birthday within $183$ days, the result should give $1$. Indeed we get \begin{align*} P(n,x)=P(183,2)=\frac{183^2-182^2}{365}=\frac{365}{365}=1 \end{align*}
Note: Observe, that due to some circular overlappings of intervals in case of $184 \leq n \leq 365$ some proper configurations are repeatedly counted, so that for some values of $x$ \begin{align*} P(n,x) \geq 1\qquad (184 \leq n \leq 365) \end{align*}
To correct the formula of $P(n,x)$ in this case should be the next activitiy :-)
Solution 4:
Using ideas similar to mjqxxxx, and following his notation I got $$P(n,x)=(N-n+1)\left({n\over N}\right)^x-(N-n)\left({n-1\over N}\right)^x,\quad 1\leq n\leq N.$$ Here $N$ is the length of the calendar; the OP takes $N=365$. With $n=40$ and $x=5$, this gives $${162381413\over 259133949125}\approx .0006266.$$
This solution is for a linear calendar where Dec. 31 and Jan. 1 are not considered consecutive. I am still working on the circular calendar.
Solution 5:
I consider a brute method in the first place by considering real numbers instead of integers.
Let us consider two persons and let us plot the permutations in a square (see illustration). These squares are periodic because the year is cyclic. We are interested in the brown area and that is given by
$$ P(z) = 1 - \Big( 1 - z \Big)^2 + z^2,\ z \le \frac{1}{2}. $$
And clearly we have $P(0)=0$ and $P(\tfrac{1}{2}) = 1$, as expected. Note that this is a brute method because we do not consider integers.
We can find the brown area more general by
$$ \int_0^1 dx \int_{x-z}^{x+z} dy = \int_0^1 dx \Big[y\Big]_{x-z}^{x+z} = \int_0^1 dx 2z = 2z, $$
and
$$ 2z = 1 - \Big( 1 - z \Big)^2 + z^2. $$
We expand this idea to $p$ persons, so we get a $p$-cube. We obtain
$$ P(z) = \int_0^1 dx_1 \int_{x_1-z}^{x_1+z} dx_2 \int_{x_1-z}^{x_1+z} dx_3 \cdots \int_{x_1-z}^{x_1+z} dx_p = \Big(2z\Big)^{p-1} $$
As $2z$ is the period $n$ divided by the number of years, we have $\displaystyle 2z=\frac{n}{N}$, whence
$$ P = \Big(\frac{n}{N}\Big)^{p-1}. $$
That is why I came with
$$ \Big( \frac{40}{365} \Big)^k $$
in the first place.
When we consider integers, we need to replace the integrals by a summations.
The total number of permutations for birthdays in the year for $p$ persons is given by
$$ P_\textrm{total} = \sum_{k_1=1}^N \sum_{k_2=1}^N \sum_{k_3=2}^N \cdots \sum_{k_p=1}^N = N^p. $$
The number of permutations for birthdays that lie in a period of $n$ days for $p$ persons is given by
$$ P_\textrm{period} = \sum_{k_1=1}^N \sum_{k_2=k_1-n/2}^{k_1+n/2} \sum_{k_3=k_1-n/2}^{k_1+n/2} \cdots \sum_{k_2=k_1-n/2}^{k_1+n/2} = N \Big( n + 1 \Big)^{p-1}. $$
The change is given by
$$ \frac{P_\textrm{period}}{P_\textrm{total}} = \left( \frac{n+1}{N} \right)^{p-1},\ n < N. $$
Small tests:
Consider $N=4$, $n=2$ and $p=2$. We get
$$ \left( \frac{2+1}{4} \right)^{2-1} = \frac{3}{4}. $$
We have the permutations
$$ \begin{array}{c|cccc} &1&2&3&4\\ \hline 1&\times&\times&&\times\\ 2&\times&\times&\times&\\ 3&&\times&\times&\times\\ 4&\times&&\times&\times\\ \end{array} $$
where the $\times$ denotes the permutations that satisfy the period condition. We have totaly $16$ permutation and $12$ permutations that satisfy the condition, thus
$$ P = \frac{12}{16} = \frac{3}{4}. $$
The case $n=40$, $N=365$ and $p=5$ gives
$$ \left( \frac{40+1}{365} \right)^{5-1} = \left( \frac{41}{365} \right)^{4} \rightarrow 0.02\% $$