Guessing a subset of $\{1,...,N\}$

I pick a random subset $S$ of $\{1,\ldots,N\}$, and you have to guess what it is. After each guess $G$, I tell you the number of elements in $G \cap S$. How many guesses do you need?


Solution 1:

An obvious upper bound is $N$ queries, since you can test each element individually. On the other hand, it takes at least $\Omega(N/\log N)$ queries: $N$ bits of information are required to identify the target subset, and each query can yield at most $O(\log N)$ bits of information, since each query has only $O(N)$ possible answers. To see that the upper bound is not sharp, consider the following strategy for $N=5$, which takes at most $4$ queries:

  1. Guess $\{1,2,3,4,5\}$. If the result is $0$ or $5$, we have the answer. If the result is $1$ or $4$, bisection search (for the single member or the single missing element) gives the answer in three more queries. Suppose the result is $2$ (the strategy for $3$ is the same by symmetry).
  2. Guess $\{1,2\}$. If the result is $2$, we have the answer. If the result is $0$, then bisection search on $\{3,4,5\}$ (for the single missing element) gives the answer in two more queries. Suppose the result is $1$. Then we know the answer is $\{a,b\}$ for some $a \in \{1,2\}$ and $b \in \{3,4,5\}$.
  3. Guess $\{1,3\}$. If the result is $2$, we have the answer. If the result is $0$, then the answer is $\{2,b\}$ for some $b\in\{4,5\}$, and one more query gives the answer. Suppose the result is $1$. Then we know the answer is $\{1,4\}$, $\{1,5\}$, or $\{2,3\}$.
  4. Guess $\{1,4\}$. The answer is $\{1,4\}$ if the result is $2$, or $\{1,5\}$ if the result is $1$, or $\{2,3\}$ if the result is $0$.

This example gives an improved upper bound asymptotic to $4N/5$. It seems likely that the correct answer is strictly $o(N)$ (i.e., eventually less than $cN$ for any fixed $c$), but whether or not it's $\Theta(N/\log N)$, I can't say.

Solution 2:

This can be solved in $\Theta(N/\log N)$ queries. First, here is a lemma:

Lemma: If you can solve $N$ in $Q$ queries, where one of the queries is the entire set $\{1,\dots,N\}$, then you can solve $2N+Q-1$ in $2Q$ queries, where one of the queries is the entire set.

Proof: Divide $\{1,\dots,2N+Q-1\}$ into three sets, $A,B$ and $C$, where $|A|=|B|=N$ and $|C|=Q-1$. By assumption, there exist subsets $A_1,\dots,A_{Q-1}$ such that you could find the unknown subset of $A$ alone by first guessing $A$, then guessing $A_1,\dots,A_{Q-1}$. Similarly, there exist subsets $B_1,\dots,B_{Q-1}$ for solving $B$. Finally, write $C=\{c_1,\dots,c_{Q-1}\}$.

The winning strategy is:

  • Guess the entire set, $\{1,\dots,2N+Q-1\}$.

  • Guess $B$.

  • For each $i\in \{1,\dots,Q-1\}$, guess $A_i\cup B_i$.

  • For each $i\in \{1,\dots,Q-1\}$, guess $A_i\cup (B\setminus B_i)\cup \{c_i\}$.

Using the parity of the the sum of the guesses $A_i\cup B_i$ and $A_i\cup (B\setminus B_i)\cup \{c_i\}$, you can determine whether or not $c_i\in S$. Then, using these same guesses, you get a system of equations which lets you solve for $|A_i \cap S|$ and $|B_i\cap S|$ for all $i$. This gives you enough info to determine $A\cap S$ and $B\cap S$, using the assumed strategy.$\tag*{$\square$}$

Let $\def\Opt{\operatorname{Opt}}\Opt(N)$ be the fewest number of guesses you need for $\{1,\dots,N\}$. Using the lemma and induction, you can show that $$ \Opt(k2^{k-1}+1)\le 2^k\qquad \text{for all }k\in \{0,1,2,\dots\} $$ Note that when $N=k2^{k-1}+1$, we have $\Opt(N)\le 2^k$, and $$\frac N{\frac12\log_2 N}=\frac{k2^{k-1}+1}{\frac12\log_2(k2^{k-1}+1)}= 2^k(1+o(1))$$ It follows that $\Opt(N)\in O(N/\log N)$ when $N$ is of the form $k2^{k-1}+1$. Since $\Opt(N+1)\le \Opt(N)+1$, this extends to all $N$. Combined with the entropy argument, we get $\Opt(N)\in \Theta(N/\log N)$.