Probability that an exam will have a perfect predictor

Here is a just-for-fun question, inspired by this answer of Noam Elkies: Suppose an exam with $q$ questions is taken by $s$ students. Each student independently has probability $1/2$ of getting each question right or wrong, a passing grade is getting the majority of the questions right. (Let's take $q$ odd, for simplicity.)

What is the asymtotic probability that there will be at least one question which is gotten right precisely by the students who pass?

Of course, there are a variety of ways in which $q$ and $s$ can go to $\infty$, so the question is which relative growth rates make this situation likely or unlikely.


Solution 1:

Write $q=2n+1$. A student's result can be summarised as a binary string of length q, where a correct answer is 1, a wrong answer 0 and the overall result the bit that appears more often. The bits equivalent to the majority are predictors; there are at least $n+1$ of them, but there may be more.

Since each question is 50/50 in being answered right or wrong, the number of 1's (or 0's) for one student is $B(2n+1,\frac12)$ distributed. As n grows larger, this can be approximated as $N(n+\frac12, \frac n2+\frac14)$ – asymptotic to $N(n, \frac n2)$.

Samples from this last distribution can be interpreted as follows: a sample near $n$ means the student scored close to 50% on the exam, and had not many predictors beyond the $n+1$ that formed the majority, while a sample near 0 or $2n$ implies that the student was near 0% or 100% on the exam, and had close to $n$ extra predictors (0 or 1, respectively).

Thus, if we fold $N(n, \frac n2)$ along its mean, we get a distribution for the number of predictors the student had. This is the half-normal distribution associated with $N(n, \frac n2)$, and its mean is $$n+\sigma\sqrt{\frac2\pi}=n+\sqrt{\frac n\pi}$$ where $\sigma=\sqrt{\frac n2}$ is the standard deviation of $N(n, \frac n2)$. The expected proportion of predictors in the student's exam is then $$\frac{n+\sqrt{\frac n\pi}}q\sim\frac12+\frac1{\sqrt{2q\pi}}$$ To find the probability with multiple students, imagine a filter that lets a proportion p of the light through, corresponding to the expected proportion of predictors for one student derived above. Two filters will let $p^2$ of the light through, corresponding to the expected proportion of common predictors in the exam between two students. Three filters let $p^3$ of the light through, corresponding to three students, and so on. Hence the asymptotic probability that a q-question exam taken by s students has at least one perfect predictor is $$\left(\frac12+\frac1{\sqrt{2q\pi}}\right)^s$$

Please note: I am only 17 years old, and I am more well-known for My Little Pony: Friendship Is Magic fan art than mathematics. However, I do know all the components in the derivation above, having learned them in my school. Some of the steps may be an approximation too much.

Solution 2:

Represent the answer to a particular question by the random variable $A$ which can take on the the values $+1$ and $-1$, for correct and incorrect. Represent the result of the test by the random variable $T$ which can take on the values $+1$ or $-1$, for pass and fail. The number of questions is $q=2n+1$.

If the random variables $A$ and $T$ for a student were independent, which is the case in the limit that $q\rightarrow\infty$, then the situation is straightforward. If $g$ students passed the test, the probability that a question is a perfect predictor is

$\hat{P}_1=P(A=+1|T=+1)^g\cdot P(A=-1|T=-1)^{s-g} = P(A=+1)^g\cdot P(A=-1)^{s-g}$

$\hat{P}_1=\left({1\over2}\right)^g\cdot\left({1\over2}\right)^{s-g}=2^{-s}$

And therefore the probability that there be at least one such question in a test with $q$ questions is

$\hat{P}_q=1 - \left(1-\hat{P}_1\right)^q=1 - \left(1-2^{-s}\right)^q \approx q\,2^{-s}$ for large $s$. Provided $q$ grows like $2^s$, then this probability does not vanish.

The problem can be answered without making the assumption that $A$ and $T$ be independent. One must calculate the conditional probability:

$P(A=+1|T=+1) = {P(T=+1|A=+1)P(A=+1)\over P(T=+1)}=P(T=+1|A=+1)$

by noting that $P(A=+1)=P(T=+1)={1\over2}$. Passing the test given that a particular question is answered correctly means that $n$ or more of the remaining $2n$ questions must also be answered correctly. Use the binomial distribution with $2n$ trials and success probability of $1\over2$. The distribution is symmetric, with $p_{<n}+p_n+p_{>n}=1$ and $p_{<n}=p_{>n}$.

$P(A=+1|T=+1)=p_n+p_{>n}={1\over2}(1+p_n)$

The probability for exactly $n$ successes in $2n$ trials for this situation is

$p_n = {2n\choose n} p^n (1-p)^n={2n\choose n}\left({1\over2}\right)^{2n}={2n\choose n}\,4^{-n}$

Since, $P(A=+1|T=+1)=P(A=-1|T=-1)$ we have

$\hat{P}_1=P(A=+1|T=+1)^s=(1+p_n)^s\,2^{-s}$

As $p_n\rightarrow0$ this agrees with the previous result. In fact, the $p_n$ term has a simple asymptotic form as $n\rightarrow\infty$, since it involves the central binomial coefficient.

$p_n\rightarrow1/\sqrt{\pi n}$

Therefore,

$$\hat{P}_1\rightarrow\left(1+{1\over\sqrt{\pi n}}\right)^s\,2^{-s} = \left({1\over2}+{1\over\sqrt{4\pi n}}\right)^s $$

As before, the probability that there be at least one perfect predictor question in a test with $q$ questions is

$\hat{P}_q=1 - \left(1-\hat{P}_1\right)^q$.

The treatment without the independence assumption does not change the conclusion that provided $q$ grows like $2^s$, then the probability for a perfect predictor question does not vanish.