If we randomly select 25 integers between 1 and 100, how many consecutive integers should we expect?
Question: Suppose we have one hundred seats, numbered 1 through 100. We randomly select 25 of these seats. What is the expected number of selected pairs of seats that are consecutive? (To clarify: we would count two consecutive selected seats as a single pair.)
For example, if the selected seats are all consecutive (eg 1-25), then we have 24 consecutive pairs (eg 1&2, 2&3, 3&4, ..., 24&25). The probability of this happening is 75/($_{100}C_{25}$). So this contributes $24\cdot 75/(_{100}C_{25}$) to the expected number of consecutive pairs.
Motivation: I teach. Near the end of an exam, when most of the students have left, I notice that there are still many pairs of students next to each other. I want to know if the number that remain should be expected or not.
If you're just interested in the expectation, you can use the fact that expectation is additive to compute
- The expected number of consecutive integers among $\{1,2\}$, plus
- The expected number of consecutive integers among $\{2,3\}$, plus
- ....
- plus the expected number of consecutive integers among $\{99,100\}$.
Each of these 99 expectations is simply the probability that $n$ and $n+1$ are both chosen, which is $\frac{25}{100}\frac{24}{99}$.
So the expected number of pairs is $99\frac{25}{100}\frac{24}{99} = 6$.
Let me present the approach proposed by Henning in another way.
We have $99$ possible pairs: $\{(1,2),(2,3),\ldots,(99,100)\}$. Let's define $X_i$ as
$$ X_i = \left \{ \begin{array}{ll} 1 & i\text{-th pair is chosen}\\ 0 & \text{otherwise}\\ \end{array} \right . $$
The $i$-th pair is denoted as $(i, i+1)$. That pair is chosen when the integer $i$ is chosen, with probability $25/100$, and the integer $i+1$ is also chosen, with probability $24/99$. Then,
$$E[X_i] = P(X_i = 1) = \frac{25}{100}\frac{24}{99},$$
and this holds for $i = 1,2,\ldots,99$. The total number of chosen pairs is then given as
$$X = X_1 + X_2 + \ldots + X_{99},$$
and using the linearity of the expectation, we get
\begin{align} E[X] &= E[X_1] + E[X_2] + \cdots +E[X_{99}]\\ &= 99E[X_1]\\ &= 99\frac{25}{100}\frac{24}{99} = 6 \end{align}
Henning Malcolm has already answered the question about the expected value. But that says very little about how likely or unlikely it would be to get deviations from the expected value - what if you observed there were 20 pairs out of 25 students left, could that happen by chance or is there some other factor at work? To answer questions like this one would need more details about the distribution, like the variance. Unfortunately, the analytic formula for the distribution (if there is one) is likely to be very complicated. So, from a practical perspective it might be interesting to investigate things numerically. That is the purpose of this post.
Here's a histogram of the number of pairs, generated numerically from 10000 trials.
You can look at this and draw your own conclusions. I would conclude the following:
- anywhere from 4 to 8 pairs would be totally reasonable,
- 0 pairs or 13+ pairs would be very strong evidence of there being another factor
- anything inbetween would be unusual, but possible
Here's the Matlab code used to generate it:
n=100;
k=25;
num_trials = 1e4;
num_pairs = zeros(num_trials,1);
for ii=1:num_trials
disp(ii)
filled_seats = sort(randperm(n,k));
gaps = filled_seats(2:end) - filled_seats(1:end-1);
num_pairs(ii) = length(find(gaps == 1));
end
histogram(num_pairs)
title(sprintf('histogram of seating pairs (out of %d trials)',num_trials))
It is not difficult to construct the bivariate generating function $G(z,u)$ of $25$-tuples by the maximum element (variable $z$) and the number of adjacent elements (variable $u$.) To do this we must choose the first element:
$$\frac{z}{1-z}$$
and then choose $24$ gaps between elements with the gap of value one marked with $u:$
$$\left(uz + \frac{z^2}{1-z}\right)^{24}.$$
The product of these is $G(z,u)$ but we are interested in the maximum being some value at most $100$ so we get
$$[z^{100}] \frac{1}{1-z} G(z, u) = [z^{100}] \frac{1}{1-z} \frac{z}{1-z} \left(uz + \frac{z^2}{1-z}\right)^{24}.$$
We need the total count of the number of adjacent pairs so we compute $$[z^{100}] \left.\frac{\partial}{\partial u} \frac{1}{1-z} G(z, u) \right|_{u=1} =[z^{100}] \left. \frac{z}{(1-z)^2} \times 24 \left(uz + \frac{z^2}{1-z}\right)^{23} \times z\right|_{u=1} \\ = [z^{100}] \frac{z}{(1-z)^2} \times 24 \left(\frac{z}{1-z}\right)^{23} \times z \\ = 24 [z^{100}] \frac{z^{25}}{(1-z)^{25}} = 24 [z^{75}] \frac{1}{(1-z)^{25}} = 24 {75+24\choose 24}.$$
This yields a final answer of $$ 24 {99\choose 24} \times {100\choose 25}^{-1} = 6.$$
Suppose there are $N\gg 10$ seats (in a circle) and density of students is $p$ (not too large, not too small: $Np\gg r$). Then probability of $r\ll N$ seats next to each other being occupied is $\approx p^r$ - that is average density of such runs. However these runs clump together and a fresh run is followed by approximately $\frac p {1-p}$ more students for a total of $1+\frac p{1-p}=\frac {1}{1-p}$ runs clumped together. This yields density of maximal "clumps" of size at least $r$:
$$d={p^r}(1-p)$$
For $p=0.25, r=2$ we get $4.7\%$. Using exact value for density of runs of length $2$ ($6$) we'd get $4.5\%$ instead. Unlike runs - which clump together - clumps repel and hence their number has a smaller variance and is better for quick comparison with the mean value.
You can do the same computation for $r=1$ to get density of maximal clumps of size at least $1$ $p(1-p)=0.19$.