Probability of getting into my favorite PhD

Solution 1:

Using the same notation as OskarSzarowicz's answer, let $p_{k,j}$ denote the probability that $k$-th tallest student gets to their $j$-th most preferred university (both $k$ and $j$ start at $1$). This probability can be expressed in a closed form by realizing that:

  • The previous $(k-1)$ students have "used up" $(k-1)$ universities in total and thus in order for the current student to be only accepted by their $j$-th choice, each of the first $(j-1)$ universities had to come this pool of $(k-1)$ universities. That can happen in $\binom{k-1}{j-1}(j-1)!$ ways.
  • The next university in the list must be a non-eliminated one, which means $(n+1-k)$ choices.
  • The remaining $(n-j)$ on their list can be chosen arbitrarily; giving us $(n-j)!$ possibilities.

Putting it all together and dividing by the total number of possible preference lists ($n!$) yields:

$$p_{k,j}=\frac{1}{n!}\binom{k-1}{j-1}(j-1)!(n+1-k)(n-j)!$$

This rather unwieldy expression can be rearranged a bit by combining the factorials into another binomial coefficient and moving the extra $j$ into the other one which pulls out one extra $k$:

$$p_{k,j}=\frac{(n+1-k)}{k}\binom{k}{j}/\binom{n}{j}$$

While this expression might not look really simpler than the previous one, it will come handy when it comes to calculating the expected number of students who are going to fail. It will actually be easier to consider a slightly more general problem, where student's probability of succeeding at their $j$-th choice is equal to $x^{j-1}$ (so the original problem corresponds to $x=\frac{1}{2}$). The expected number of students succeeding is equal to the double sum $$S = \sum_{k=1}^n \sum_{j=1}^k p_{k,j}x^{j-1}$$

Exchanging the other of summation allows us to pull out some terms in front of the inner sum and keep only one binomial coefficient in it (one whose bottom part is independent of the summation variable): $$S=\sum_{j=1}^n \left(x^{j-1}/\binom{n}{j}\sum_{k=j}^n \frac{(n+1-k)}{k}\binom{k}{j}\right)$$

For the inner sum, we have $$I(n,j)=\sum_{k=j}^n \frac{(n+1-k)}{k}\binom{k}{j}= (n+1)\sum_{k=j}^n \frac{1}{k}\binom{k}{j} - \sum_{k=j}^n \binom{k}{j}$$

The first sum can be also written as $$\sum_{k=j}^n \frac{1}{k}\binom{k}{j} = \frac{1}{j}\sum_{k=j}^n \binom{k-1}{j-1}=\frac{1}{j}\sum_{k=j-1}^{n-1} \binom{k}{j-1}$$

Now we can use the combinatorial identity $\sum_{k=j}^n \binom{k}{j} = \binom{n+1}{j+1}$ to obtain $$I(n,j)=\frac{n+1}{j}\binom{n}{j} - \binom{n+1}{j+1} = (n+1)\left(\frac{1}{j} - \frac{1}{j+1}\right)\binom{n}{j}$$

Returning back to our expression for $S$ and cancelling out the binomial coefficients, we get

$$S=(n+1)\sum_{j=1}^n \left(\frac{1}{j} - \frac{1}{j+1}\right)x^{j-1}$$

or, moving the $(n+1)$ factor to the other side and rearranging the terms a bit:

$$\frac{S}{n+1}=\frac{1}{x} - \frac{x^{n-1}}{n+1} + \frac{x-1}{x^2}\sum_{j=1}^n \frac{x^{j}}{j}$$

Unfortunately, this is the point where we might have ran out of luck when it comes to exact calculations for fixed $n$; the latest sum does not have a closed-form expression (at least as far as I know). It is a truncated form of the Taylor expansion of $-\log(1-x)$, though, so if we are only interested in the asymptotic behaviour, we can get the following result:

$$S^* = \lim_{n\to\infty}\frac{S}{n+1} = \frac{1}{x} + \frac{1-x}{x^2}\ln(1-x)$$

Note that this limit would be the same if we divided by $n$ rather than $(n+1)$ since we are going to infinity. Thus, the fraction of students who fail their tests tends to $(1-S^*)$. For the specific choice of $x=\frac{1}{2}$, we get $1-S^*=\ln{4}-1\approx 0.386294$.

Solution 2:

This is a very sketch proof I will edit in the morning to make cleaner, ultimately we use and abuse the complete uniform indifference, that is at any point if we know there are $m$ colleges remaining we know that any college is equally likely to be taken or not.

Let $p_{k,j}$ denote the probability that the $k^{th}$ tallest gets into preference $j$. Notice that $p_{k,j} = 0 $ for all $j > k$

You correctly noted that $p_{k,1} = \frac{n+1-k}{n} = 1 -\frac{k-1}{n}$

We can then progress to $p_{k,2}$:

We need to have failed getting into our first choice and then have our second college amonst the $n - (k-1)$ colleges that have not been selected. However we must condition on the fact that we failed to get into our first college. We have only $n-1$ colleges to care about because we know our top choice is removed. We know that $k-1$ colleges have already been taken of which $1$ of them is our top choice so only $ k-2 $ other colleges that we care about have already been selected. Hence the probability our second choice has also been selected given our first one is gone is $\frac{k-2}{n-1}$

Hence $p_{k,2} = (1-p_{k_1})\cdot (1-\frac{k-2}{n-1})$

Going forward to $p_{k,3}$:

As before, we need to have failed getting into our first, failed getting into our second and then have our third preference not already selected.

Failling to get into our first and second is simply $(1-p_{k,1}) (1-p_{k,2})$

Given that we have failed to get into these colleges we know there are $n-2$ possible colleges remaining. We know the already selected colleges contain our top preference and our second top and then $k-1-2$ others. So the probability that our third college is amongst these is simply $\frac{k-3}{n-2}$.

Hence $p_{k,3} = (1-p_{k,1}) (1-p_{k,2}) (1-\frac{k-3}{n-2})$

And in general it is easy to see $p_{k,j} = (1-\frac{k-j}{n+1-j})\prod_{i=1}^{k-1} (1-p_{k,i})$

Someone with better real analysis might be able to express this product in a clean closed-form hopefully! I think the recurrence can be put into a nice sum. For now if you want to calculate $p_{k,j}$ you will have to start with your formula for $p_{k,1}$ and work upwards.