Consecutive birthdays probability

Solution 1:

I believe we can give a formula, but I would not call it "closed form".

We have $\displaystyle n$ people, and $\displaystyle k$ possible birthdays to choose from (i.e. $\displaystyle k$ days in a year). Let $\displaystyle M$ be the minimum of the two.

We will try and count the number of birthday assignments in which no two people have consecutive birthdays.

To do this, we will try and count the assignments which use exactly $d$ distinct birthdays, for $\displaystyle d=1, 2 \dots, k$, and then add them up.

Now suppose we had a set of $\displaystyle d$ distinct birthdays (don't worry about the consecutive part, just yet). How many ways can we assign these $\displaystyle d$ birthdays to $\displaystyle n$ people, so that each birthday is used at least once?

This is basically the problem of finding the number of ways to partition a set of size $\displaystyle n$ in exactly $\displaystyle d$ non-empty parts and assign the $\displaystyle d$ birthdays to each part in the partition, exactly one to each.

The number of ways to partition a set of size $\displaystyle n$ into $\displaystyle d$ non-empty parts is given by a Stirling Number of Second Kind, $S(n,d)$. The number of ways to assign $\displaystyle d$ birthdays is $\displaystyle d!$.

Thus the number we are looking for is $\displaystyle S(n,d) \times d!$.

Now suppose we managed to count the number of subsets containing $\displaystyle d$ elements (from the $\displaystyle k$ birthdays) such that no two elements of the set are consecutive, then we could multiply that number by $\displaystyle S(n,d) \times d!$ to give the number of birthday assignments which use exactly $\displaystyle d$ birthdays such that no two are consecutive.

For the moment, ignore the fact that Jan 1 and Dec 31 are consecutive.

We need to select $\displaystyle d$ numbers from $\displaystyle 1,2, \dots, k$ such that no two are consecutive.

Now if $\displaystyle b_1 \lt b_2 \lt \dots \lt b_d$ were such numbers, then notice that

$\displaystyle 1 \le b_1 \lt b_2 - 1 \lt b_3 - 2 \dots \lt b_d - (d-1) \le k-(d-1)$ gives us a way to select numbers from $\displaystyle 1, 2, \dots, k-(d-1)$ without having to bother about the consecutive issue.

This can be done in $\displaystyle {k-d+1 \choose d}$ ways.

Now since Jan 1 and Dec 31 are consecutive, we need to subtract from this, the number of sets which contain both $\displaystyle 1$ and $\displaystyle k$.

This number is same as the number of $\displaystyle d-2$ non-consecutive subsets of $\displaystyle \{3, \dots, k-2\}$ which is $\displaystyle {k-d-1 \choose d-2}$.

Thus the number of ways of selecting $\displaystyle d$ non-consecutive birthdays (assuming $\displaystyle 1$ and $\displaystyle k$ are consecutive) is $\displaystyle {k-d+1 \choose d} - {k-d-1 \choose d-2}$, with the understanding that the term being subtracted is $\displaystyle 0$ for $\displaystyle d = 1$ and that $\displaystyle {a \choose b} = 0$ if $\displaystyle a \lt b$

Thus, for $\displaystyle M = \text{min}\{n,k\}$, the total number we are looking for is,

$$ \sum_{d=1}^{M} S(n,d)\times d! \times \left({k-d+1 \choose d} - {k-d-1 \choose d-2}\right)$$

Note that $\displaystyle S(n,d) \ d! = \sum_{j=0}^{d} (-1)^j {d \choose j} (d-j)^n$

So we could also write the formula as

$$ \sum_{d=1}^{M} \left(\sum_{j=0}^{d} (-1)^j {d \choose j} (d-j)^n\right) \left({k-d+1 \choose d} - {k-d-1 \choose d-2}\right)$$

which is a bit ugly.

Thus the probability you are looking for is

$$1 - \frac{ \sum_{d=1}^{M} S(n,d)\times d! \times \left({k-d+1 \choose d} - {k-d-1 \choose d-2}\right)}{k^n}$$

Solution 2:

NB: I worked earlier on this problem and came up with the following solution. The first answer that was posted made me think that I got something wrong, and I discarded my work. Since the new answer by Moron agrees (essentially) with my previous work here it is, a slightly different derivation of the same formula.

Let $k$ be the number of days in a year. Let $m$ be the distinct number of birthdays among the $n$ friends. Let's assume that $k$ is big enough to have a non-trivial problem (say, $k > n/2$)

We're interested in binary strings with these three conditions:

  1. Are of length $k$, with $m$ ones and $k-m$ zeros.
  2. There's at least one zero between any two ones.
  3. Condition 2 holds when "wrapping around" the string.

Let's count them by constructing them with the following algorithm:

  • Start with a string of $m$ ones: $11\dots 1$
  • There are now three distinct cases: there's a birthday on the first day of the year, there is a birthday on the last day of the year, there's a birthday on neither.
  • In the first case we have to distribute $k-m$ zeros in $m$ non-empty buckets. Those are called compositions, and one can show that there are $\binom{k-m-1}{m-1}$ such assignments.
  • The second case is analogous, giving another $\binom{k-m-1}{m-1}$ possible strings.
  • The third case is similar, giving instead $\binom{k-m-1}{m}$ strings.

Putting this together we have: $$2\cdot \binom{k-m-1}{m-1}+\binom{k-m-1}{m}$$

Since $n$ friends share $m$ birthdays we have to account for that, giving this expression:

$$p(n,k) =\frac{\sum_{m=1}^n{ m! \cdot S(n,m)\cdot \Bigg (2\cdot \binom{k-m-1}{m-1}+\binom{k-m-1}{m}}\Bigg )}{k^n}$$

where $S(n,m)$ is the Stirling number of the second kind.

@Moron: Why do we need to multiply by $m!$ here? Oh right! Stirling numbers of the second kind count the number of coalitions, but we care about their order, too!

Solution 3:

UPDATE - THIS IS WRONG -- SEE BELOW CORRECT ANSWER --- Let's call N1 the number of configurations that have at least one day in between birthdays (this excludes not only consecutive birthdays, but also coincident birthdays).

I get (counting weak compositions) :

$ \displaystyle N_1 = 365 \frac{(365-n-1)!}{(365-2n)!}$

If you want to include coincident birthdays, we have

$\displaystyle N_0 = \frac{365!}{(365-n)!}$

So the probability asked is

$\displaystyle P = \frac{N_0 - N_1}{365^n}$

Update: there might be is some error here, I think I'm failing to taking into account the configurations that have both coincident and consecutive birthdays, I'll revise this tonight. I suspect that N1 is correct, and that allows to compute the probability of having consecutive OR coincident birthdays. To count consecutive (exclusively) birthdays seems more difficult.


UPDATE: Here's the correct (I hope) answer.

The probability of having at least a pair of consecutive birthday for M (=365) days and n persons is

$\displaystyle P(M,n) = 1 - \sum_{k=1}^n Q(M,k) {M \choose k} \frac{S(n,k) k! }{M^n} = 1 - \frac{1}{M^{n-1}} \sum_{k=1}^n \frac{(M-k-1)!}{(M-2k)!} S(n,k) $

where $ \displaystyle Q(M,k) = \frac{{M -k - 1 \choose k -1} }{{M -1 \choose k -1} } $

and $S(n,k)$ are the Stirling numbers of the second kind

Some computed values follow

M=365 n=1 p=0.00000000
M=365 n=2 p=0.00547835
M=365 n=3 p=0.01634745
M=365 n=4 p=0.03242761
M=365 n=5 p=0.05345896
M=365 n=6 p=0.07910314
M=365 n=7 p=0.10895871
M=365 n=8 p=0.14256439
M=365 n=9 p=0.17941667
M=365 n=10 p=0.21897764
M=365 n=11 p=0.26069278
M=365 n=12 p=0.30400167
M=365 n=13 p=0.34834843
M=365 n=14 p=0.39319571
M=365 n=15 p=0.43803357
M=365 n=16 p=0.48239009
M=365 n=17 p=0.52583640

M=365 n=30 p=0.90729104

M=365 n=42 p=0.99074145 

Explanation follows: -------------------------------

Let $M$ (=365) number of days, and $n$ = number of persons and $k$ = number of distinct birthdays ($k$ is not fixed, it's a random variable in the range $1..n$). Let $P_{M,n}(S)$ be the probability of NOT having consecutive birthdays (S is the event: "all birthdays are separated")

Then $P_{M,n}(S) = \sum_k P_{M,n}(S \; k) = \sum_k P_{M,n}( S | k ) P_{M,n}(k )$

The following steps compute the two factors inside the summation:

STEP 1:

$P_{M,n}( S | k )$ is the probability of having the events separated, given that the number of distinct birthdays is $k$. If we think of birthdays as occupied boxes in a circular list, a little reflection shows that all configurations are equiprobable ($n$ does not matter now) and we can assume without altering the result that the first box of the list is occupied. Now, the total number of possible ocurrences are the number of selection of $k-1$ boxes among $M-1$ (combination).

And the number of "succesfull" arrangements are those that result in non-consecutive occupied boxes. But this is equivalent of specifying a list of $k$ numbers greater than 1 that sum up to M; and this is the same as specifying a list of $k$ numbers greater than 0 than sum up to $M-k$; and this can be counted, by a reasoning similar to this one, as the combination of $k -1$ taken from $M -k - 1$. So,

$\displaystyle P_{M,n}( S | k ) = \frac{{M -k - 1 \choose k -1} }{{M -1 \choose k -1} } = Q(M,k)$

STEP2:

$P_{M,n}(k)$ : if we place $n$ balls at random in $M$ boxes, what is the probability that exactly $k$ boxes will be non-empty?

The total counting is given by $M^n$. To count the "successful" cases, we multiply:

  • all the possible sets of occupied boxes (combinations of $k$ taken from $M$)

  • the total numbers of ways of putting $n$ balls in $k$ boxes leaving no empty box: that is given by the Stirling numbers of the second kind.

  • and we need to multiply by $k!$ (permutations of boxes) because the Stirling numbers consider nondistinct boxes.

    $\displaystyle P_{M,n}(k ) = {M \choose k} \frac{S(n,k) k! }{M^n}$

And from there follows the formula above.