A "What's my vector?" game

Alice chooses a binary vector $V$ of length $n$ which is unknown to Bob. In each round Bob can choose any number of indices $i$ and then Alice tells Bob how many of the $V_i$ are set to $1$. The game terminates when Bob can say with certainty exactly which binary vector Alice has. Alice would like the game to go on as long as possible and Bob would like to make it as short as possible.

What are Alice and Bob's best strategies and how long does the game take?

For $n=7$ the following strategy always works.

Bob chooses indices 35, 234, 24567, 12356,14567 . This takes $5$ rounds.

For $n=10$ the following strategy always works.

Bob chooses indices 1-3-4-5-6-7-8, 3-5-9, 1-2-3-8-10,1-2-4-5-9-10,1-2-5-7-8, 1-2-3-4-5-8-10, 2-4-7-8-10. This takes $7$ rounds.

Solution 1:

Under the following two assumptions, this question comes down to experimental design (in the information theoretic sense).

Bob is allowed to use the information after moves $1, 2, \ldots, k-1$ to determine move $k$.
Alice either chooses and sticks to a binary vector, $V$, or is allowed to change bits before each of Bob's moves -- that is, before she knows which indices Bob will pick next -- as long as the changes are consistent with Bob's existing information. (Information theoretically, the case where Alice is not allowed to change bits and the case where Alice is allowed to change bits without knowing what Bob will do next are equivalent.)

There is a true state of $V$ that Bob is trying to discover through measurements. Each experiment entails Bob selecting a set of indices, based on the information that he has already, and then discovering at how many of those indices there are ones in $V$. Bob's experimental design question, at any step, is At which indices should I measure next so as to maximize the expected information gain from the measurement?

This is equivalent to the question, Which choice of indices maximizes the entropy of the probability distribution over possible outcomes at those indices? The base-2 entropy of a discrete random variable, $X$, with $m$ possible outcomes $\{x_1,x_2,\ldots,x_m\}$, that are distributed according to the probability mass function, $P(x_i)$, is defined as $$ H_2(X) = -\sum_{i=1}^m P(x_i)\log_2 P(x_i) $$ In this case, the possible outcomes are the different possible integer answers that Bob can get when asking Alice how many entries in $V$ are $1$ for a given set of indices and $P(x_i)$ gives the probability of getting each of the possible integer answers.

For example, let's say $n=4$ and Bob already knows that the total number of $1$ bits is $3$. We write this as $$A(1,2,3,4) = 3$$ meaning the answer that Alice gives when asked how many of bits $1$, $2$, $3$ and $4$ are set. What is the expected information gain if Bob next asks for $A(1)$ or $A(1,2)$ or $A(1,2,3)$ or $A(1,2,3,4)$? (These are the only questions Bob has to consider since all others follow from symmetry.) $$ \begin{array}{l|l|l} & P(x_i) & H_2(X) \\ \hline A(1) & \left\{0: \frac{1}{4},\ 1: \frac{3}{4}\right\} & 2 - \frac{3\log_2 3}{4} \approx 0.81 \\ A(1,2) & \left\{1: \frac{1}{2},\ 2: \frac{1}{2}\right\} & 1 \\ A(1,2,3) & \left\{2: \frac{3}{4},\ 3: \frac{1}{4}\right\} & 2 - \frac{3\log_2 3}{4} \approx 0.81 \\ A(1,2,3,4) & \left\{3: 1\right\} & 0 \end{array} $$ This means that Bob's best move is to ask for $A(1,2)$ -- or any other 2-index set, by symmetry -- since that is the move that maximizes the expected information gain. Note that the expected information gain from $A(1,2,3,4)$ is $0$ since we already have deterministic information about the value of that measurement. After learning the value of $A(1,2)$, Bob has a new constraint on the possible values of different measurements, can update the probability distributions associated with those measurements, recalculate their entropies and determine his next move.