Determining information in minimum trials (combinatorics problem)

A student has to pass a exam, with $k2^{k-1}$ questions to be answered by yes or no, on a subject he knows nothing about. The student is allowed to pass mock exams who have the same questions as the real exam. After each mock exam the teacher tells the student how many right answers he got, and when the student feels ready, he can pass the real exam. Show that if the student is good at combinatorics he can guess all the answers after only $2^{k}$ mock exams.

For those who prefer a more formal presentation of the problem: if $E=\{1,-1\}^{N}$ we seek $n$ so that for some vectors $v_{1},v_{2}, \ldots, v_{n}\in E$ \begin{align*} \phi\colon E &\rightarrow \mathbb{N}^{n},\\ v &\mapsto (\langle v,v_{1}\rangle, \langle v,v_{2}\rangle, \ldots, \langle v,v_{n}\rangle) \end{align*} is injective. Show that it is possible to find such vectors when $N=k2^{k-1}$ and $n=2^{k}$.

It is possible to use duality to transform again the problem. We seek a $n\times N$ matrix $M$ such that $v\mapsto Mv$ is injective over $E$. If $X$ is the formal polynomial vector $X=(X_{1},X_{2}, \ldots,X_{n})$ then $v\mapsto Mv$ is injective iff $v\mapsto \langle X,Mv \rangle$ also is. But $\langle X,Mv \rangle = \langle M^{T}X, v \rangle$ and it is easily seen that $v\mapsto \langle M^{T}X,v \rangle$ is injective iff the $N$ column vectors $x_{i}$ of $M$ are such that $\sum_{i\in I} x_{i}\neq \sum_{j\in J} x_{j}$ for any two different subsets $I$, $J$ of $\{1,\dots,N\}$.

This problem comes from an olympiad-like contest. The original problem was formulated with 30 questions and the aim was to prove that the student could guess with 24 trials. One of my teachers came up with the result above (which would give 16 trials for 30 questions), but I can't remember his proof or find another one by myself.

I tagged this information theory because some probabilistic arguments show that it is impossible to do better than this asymptotically. More precisely, if we choose the coordinates of the $v$ defined above randomly, then \begin{equation*} N = H(\phi(v)) \leqslant \sum_{i} H(\langle v,v_{i} \rangle) = nH(B(N,1/2)) \sim (n/2)\log_{2}(N) \end{equation*} where $H$ designates entropy. With $N=k2^{k-1}$ we get $n\geqslant c\frac{k2^{k}}{k-1+\log_{2}(k)}\geqslant c2^{k}$ for all $c<1$ and $k$ large enough.


Thanks to DeepL translate, I was able to read the article in the Russian journal posted in the comments (https://www.mccme.ru/free-books/matprosf.html), and I can now give the solution. In fact, you can determine the answers to a test with $k2^{k-1}+1$ questions in only $2^k$ non-adaptive attempts.

First, let us reformulate the problem in terms of weighings. Suppose without loss of generality your first submission is a test with all answers being "yes." For any subset $A$ of questions, let the weight of $A$ be the number of questions in $A$ whose answer is "yes." Given the result of this all-"yes" submission, for any subset $A$ of questions, each of these two pieces of information can be deduced from the other:

  • The score of a test when you answer "yes" for all questions in $A$, and "no" for the other questions

  • The weight of $A$.

Therefore, I can and will state the strategy in terms of weights.

The fact that you can solve $k2^{k-1}+1$ questions in $2^k$ attempts follows from the following lemma.

Lemma: Suppose you can deduce the answers to a test with $M$ questions in $N$ attempts. Then it is possible to deduce the answers to a test with $2M+N-1$ questions in $2N$ attempts.

Proof: Given a test with $2M+N-1$ questions, divide the questions into three sets $A,B$ and $C$, where $|A|=|B|=M$, and $|C|=N-1$. By assumption, there exist $N-1$ sets $A_1,\dots,A_{N-1}$ such that you could solve a test whose question set is $A$ by submitting the all-"yes" test, and then weighing $A_1,\dots,A_{N-1}$. Similarly, there exist sets $B_1,\dots,B_{N-1}$ for solving a test whose question set is $B$. Finally, let $C=\{c_1,c_2,\dots,c_{N-1}\}$. Then the solution for the $(2M+N-1)$-question test is this:

  • Answer "yes" to all questions.
  • Find the weight of $B$.
  • For each $k\in \{1,\dots,N-1\}$: find the weights of these two sets:
    • $A_k\cup B_k\cup \{c_k\}$
    • $A_k\cup (B\setminus B_k)$.

Here is how you deduce the answers to the test. For each $k\in \{1,\dots,N-1\}$, the parity of the sum of the weights of $A_k\cup B_k\cup \{c_k\}$ and $A_k\cup (B\setminus B_k)$ and $B$ determines the answer to $c_k$. We now know the weights of the three sets $A_k\cup B_k$, $A_k\cup (B\setminus B_k)$, and $B$, which gives a simple system of equations to solve for $A_k$ and $B_k$. This gives us the information we need to deduce the answers to the $A$ questions and $B$ questions, using the assumed strategy.$\tag*{$\square$}$ That is, starting with the two-attempt solution to a two-question test, you can derive a four-attempt solution to a test with $2\cdot 2+(2-1)=5$ questions, then an eight attempt solution to a test with $2\cdot 5+(4-1)=13$ questions, and so on, in general leading to $k2^{k-1}+1$ questions in $2^k$ attempts.


The strategy can be modified extend an $N$-attempt strategy for an $M$-question test to a $2N$-attempt strategy for a $2M+r$ test for any $r\in \{0,1,\dots,N-1\}$, by ignoring some of the $C$ questions. This lets you hand the $k2^{k-1}$ question test in $2^k$ questions, as exactly asked in the OP.

Furthermore, it is easy to extend an $N$-attempt strategy for $M$ to an $(N+1)$-attempt strategy for $M+1$. Together with these two rules, you can inductively use the lemma to determine a strategy for any number of tests.