Analysis of a calculation of expected number of collisions in hashing

For a formal problem statement, I quote from the text Introduction to Algorithms by Cormen et. al

Suppose we use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of $\{\{k, l\} : k\neq l \text{ and } h(k)= h(l)\}$ ?

Quite intuitively I proceeded as follows:

Let $X$ be the random variable indicating the number of collisions. Let us define an indicator random variable $X_{ij}$ which indicates whether the $i$th element ($e_i$) already in the table collides with the $j$ th element ($e_j$) when the $j$ th element is to be inserted.

$Pr\{h(e_i)=h(e_j)\}=\frac{1}{m}$. So $E[X_{ij}]=\frac{1}{m}$.

So based on that we have:

$$X= \sum_{i=1}^n \sum_{j=i+1}^n X_{ij}$$

Taking expectation on both sides we have: $$E[X]= E\left[\sum_{i=1}^n \sum_{j=i+1}^n X_{ij}\right]$$

Using linearity of expectation:

$$E[X]= \sum_{i=1}^n \sum_{j=i+1}^n E[X_{ij}]$$

But, $E[X_{ij}]=\frac{1}{m}$ , which is already found above.

So,

$$E[X]= \sum_{i=1}^n \sum_{j=i+1}^n \frac{1}{m}$$

$$=\frac{n(n-1)}{2m}$$

Below is how my peer approached:

Let $X$ be the random variable indicating the number of collisions. Let $X_i$ be the indicator random variable indicating collision during the $i$ th insertion of the element, $i=1,2,3,..,n$

So, $$X=\sum_{i=1}^n X_i$$

taking expectation on both sides, we have:

$$E[X]=E\left[\sum_{i=1}^n X_i\right]$$

Using linearity of expectation:

$$E[X]=\sum_{i=1}^n E[X_i]$$

Now let us find $Pr\{X_i=1\}$ then by the property of indicator random variable we shall have $E[X_i]=Pr\{X_i=1\}$.

Probability that there is collision during the first insertion =$0$ [First element is inserted without any collision.]

Probability that there is collision during the second insertion= $\frac{1}{m}$ [Assuming open addressing, $1$ slot is already occupied.]

Probability that there is collision during the third insertion= $\frac{2}{m}$ [Assuming open addressing, $2$ slots are already occupied.]

Probability that there is collision during the $i$th insertion= $\frac{i-1}{m}$ [Assuming open addressing, $i-1$ slots are already occupied.]

So,$$E[X]=\sum_{i=1}^n \frac{i-1}{m}=\frac{n(n-1)}{2m}$$

Though the final answer obtained by my peer is same as that of mine, but I guess there are quite a lot of issues with the approach. First it assumes open addressing, while my approach is a generalized one. Secondly, as per the formal problem statement given in the text, in the worst case if all elements hash to the same slot then we shall have $X=\binom{n}{2}$. So the spectrum of $X$ is , $X=0,1,2,..,\binom{n}{2}$

But the spectrum of $X$ as per my peer's method is $X=0,1,2..,n$. Is my peer's method actually correct?

Solution 1:

Your peer's method is incorrect. The probability the $i$th key collides with the previous keys is not $(i - 1) / m$. Furthermore, he fails to take into account that adding colliding with a pair of keys should add two collisions, not just one. By a happy coincidence, however, these errors cancel each other out.

Basic math explanation (related to estimating linear regression with no intercept)

Why does matrix multiplication need to follow the rule of joined corner marks? [closed]

Find equation of the plane passing through the line $x+y=2, y-z=3$ and is perpendicular to the plane $2x+3y+4z=5$

For an arbitrary continuous $\mathbb{R}\to\mathbb{R}$, how to show that a certain Borel set is contained in the set of finite differentiability?

Restriction of vector bundle to norm 1 is a covering map

Show that the distribution defined by $f_n(x)=\sin(nx)$ converges to $T_0$

Spurious solution to differential equation?

Why is the abelian group of points on an elliptic curve over a finite field isomorphic to the product of at most two cyclic groups?

Spectral decomposition of compact self-adjoint operator

Compressibility factor (Z) of the Redlich-Kwong Equation

One of inequalities in the proof of Martingale transforms convergence

C# Could not load file or assembly : NuGet