The Probabilistic Pigeon Hole Principle

Many people are aware of the Pigeonhole Principle: If we distribute $n+1$ pigeons into $n$ pigeonholes, at least one hole will contain at least two pigeons. However, much fewer are aware of the $\bf{Probabilistic}$ Pigeonhole Principle, which answers the question, 'How many do we usually need?'

Those familiar with the famous birthday problem might have good intuition for this. Let's recall the birthday problem: How many (randomly chosen) people do you need in a room for the probability that some two share a birthday to be more than 1/2? Wikipedia has a nice article on this here. (If this is the first time you've heard the question, it's a great one to think about.) For most people, the answer is quite a surprise, as it is only 23.

Returning to the general problem, if we randomly distribute $n^{1/2}$ pigeons into $n$ pigeonholes, the probability of some overlap approaches 1, as $n \to \infty$.

Let's state that more precisely:

Take any $\epsilon > 0.$ Randomly place $\lfloor n^{1/2 + \epsilon} \rfloor$ pigeons into $n$ holes. (Each is placed independently of the others and with uniform distribution.) Let $\Pr{(n,\epsilon)}$ be the probability that at least two pigeons are placed in the same hole. Then we have, $$\lim_{n \to \infty} \Pr{(n,\epsilon)} = 1.$$ I'm having some trouble proving this, and I also haven't found a proof online. In any case, the proof itself should not be difficult. Here is what I have so far:

Let $m = \lfloor n^{1/2 + \epsilon} \rfloor$. Then we have, $$\Pr{(n,\epsilon)} = 1 - \frac{(n)_m}{n^m},$$ where $(n)_m$ is the falling factorial. i.e. $(n)_m = n(n-1)\times \ldots \times (n-(m-1))$. I tried applying Stirling's formula but after doing some algebraic manipulatoin, I couldn't quite finish the proof. (I also crunched some numbers with Python, providing some numerical evidence.) So, what would be a proof of this?

Here is an extra note on my background to the question. I first became acquainted with it at a math conference when two people, (Piotr Przytycki and another person, whose name I don't recall), gave a short introduction to random (finitely presented) groups. The above principle is used to prove part of a basic result. You can read it in the literature on the bottom of page 31 of the excellent 2005 survey article (on random groups) which Yann Ollivier wrote.

Solution 1:

Suggestion: Forget about the pigeons and holes and think about balls and bins instead! (an equivalent but more common model in the probabilistic literature).

The following is an outline of a proof you may find here:

Let $C(n,m)$ be the collision probability when randomly throwing $m < n$ balls into $n$ bins (or $m$ pigeons into $n$ holes?!). Then we have: $$1-C(n,m) = \prod_{i=1}^{m-1} \left(1-\frac i n\right)$$ Since $\frac i n <1$, we can apply the inequality $1-x\le e^{-x}$ to get: $$\prod_{i=1}^{m-1} \left(1-\frac i n\right) \le \exp\left(-\frac {m(m-1)}{2n}\right)$$

In your case, $m$ is asymptotically bigger than $\sqrt{n}$ which means $m(m-1)$ is asymptotically bigger than $n$; hence $\exp\left(-\frac {m(m-1)}{2n}\right)$ - or the probability of having no collision - tends to zero as $n$ goes to infinity.

Solution 2:

I have satisfied myself by an argument like the following. May be it isn't quite what you need, but it does avoid the use of Stirling's formula, which makes it easier for people like me to follow.

Assume you have placed $\sqrt n$ pigeons, and no collisions took place. Then a fraction of $1/\sqrt n$ of the holes are occupied. This means that the probability of the next $\sqrt n$ pigeons avoiding collisions is bounded from above by the number $(1-1/\sqrt n)^{\sqrt n}\approx 1/e$. Continuing, after $(k+1)\sqrt n$ pigeons the probability of no collisions is less than $(1-1/\sqrt n)^{k\sqrt n}\approx e^{-k}$.

An incorrect answer for an integral

Is "connected, simply connected" Redundant?

Understanding Hatcher's proof of Borsuk-Ulam theorem for $n=2$

When two projections in a C*-algebra are "almost" Murray-von Neumann equivalent, they are equivalent

Video Lessons in Complex Analysis

How to visualize $1$-forms and $p$-forms?

If my average speed is 50 mph, is it better to go 10% faster or 5 mph faster? (These are not the same)

When does Separation of Variables yield basis of solution set?

When does $(p-1)! + 1 = p^k $ hold? [duplicate]

How to use Chebyshev Polynomials to approximate $\sin(x)$ and $\cos(x)$ within the interval $[−π,π]$?

Conjecture $\int_0^1\frac{\ln^2\left(1+x+x^2\right)}x dx\stackrel?=\frac{2\pi}{9\sqrt3}\psi^{(1)}(\tfrac13)-\frac{4\pi^3}{27\sqrt3}-\frac23\zeta(3)$

Why do natural transformations express the fact that a vector space is canonically embedded in its double-dual but not in its dual?