Twenty questions against a liar

Here's one that popped into my mind when I was thinking about binary search.

I'm thinking of an integer between 1 and n. You have to guess my number. You win as soon as you guess the correct number. If your guess is not correct, I'll give you a hint by saying "too high" or "too low". What's your best strategy?

This is an easy problem if I always tell the truth: by guessing the (rounded) mean of the lower bound and the upper bound, you can find my number in roughly log₂n guesses at most.

But what if I'm allowed to cheat once? What is your best strategy then? To clarify: if you guess my number, you always win instantly. But I'm allowed, at most once, to tell you "too high" when your guess is actually too low, or the opposite. I can also decide not to lie.

Here's a rough upper bound: you can ask each number twice to make sure I'm not cheating, and if I ever give two different answers, just ask a third time and from then on, play the regular game. In this way, you can win with about 2 log₂n guesses at most.

I'm pretty sure that bound can be improved. Any ideas?

There is a fairly simple strategy which requires (1 + ε)log(n) + 1/ε + O(1) queries, for any constant ε > 0. I illustrate this strategy below.

First, I ask you whether or not X, the secret number, is n/2. Without loss of generality, suppose you answer "less". I learn nothing at first, because you may be lying; but for the moment I give you the benefit of the doubt.
I next ask you whether X is n/4. If you say "less", I don't know whether or not 0 < X < n/4, but I do know that 0 < X < n/2, because you can't lie twice. Similarly, if I go about a normal binary search, so long as you continue to answer "less", I know that your answer to the preceding query was honest. So we may reduce to the case where you say "more" to my query of whether X is n/4.
If I continue to take you at your word, persue a binary search, and enquire about whether X is 3n/8, you may say either "more" or "less". If you say "more", then I don't know whether X > n/2, but I do know that X > n/4, again because you can't lie twice. So again so long as you continue to answer "more" in my normal binary search, I know that your answer to the preceding query was honest.

More generally: if I consistently guess under the hypothesis that you are being honest, in any "monotonic" sequence of responses, I know that all but (possibly) the last of them are honest. So it might seem as though the worst case scenario is where your responses alternate a lot, as would occur for instance if you had chosen something like X = _⌊ n/3 _⌋. But in the alternating case too, I can be confident of the honesty of some of your answers:

If you say (non-monotonically) that X is less than 3n/8, I don't know whether X is less than 3n/8 for sure; but I do know that X < n/2, because again you can't have lied about both.

More generally: not only do I know that monotonic subsequences of answers are mostly honest, I know that any time that you answer "more" or "less", your previous answer of the same kind was also honest. So in fact I should be most suspicious, when I encounter long monotonic subsequences in your answers, that the answer previous to that monotonic subsequence was a lie.

What I need is a strategy that will tell me when to revisit an old question, depending on how large n is. Ideally, revisiting old questions would require very little overhead. If I encounter a monotonic sequence of f(n) responses, I revisit the last question before that monotonic sequence started.

For instance, if your responses to queries about n/4, 3n/8, 7n/16, etc. are all monotonically "more", I eventually ask about the last time you said "less", which is for n/2, just in case you lied back then. This allows me to avoid the scenario where you lie about n/2, and keep me guessing at (2^t−1 − 1)/2^t until I eliminate all possibilities, catch you in your lie, and then redo all of my binary search for queries greater than n/2.

If I do a double-check like this every time I encounter a monotonic sequence of length r, I will in the worst case query you about (r+1)log(n)/r = (1 + 1/r)log(n) times. If I catch you in a lie, I will have only "wasted" r queries, and my strategy afterwards can be just a simple binary search without double-checks; so your optimal strategy for maximizing the number of queries I make is actually not to lie at all, or to save your lie for nearly the end of the game to cost me about r additional queries. Here r can be an arbitrarily large integer; thus for any ε, I can achieve a query rate of (1 + ε)log(n) + 1/ε + O(1) by setting r > 1/ε.

Bonus problem #1. Without too much extra work, I think you can improve this to a strategy requiring only log(n) + O(log log(n)) queries, but I'm too lazy to work out the details right now.

Bonus problem #2. Generalize this strategy to the regime where you are allowed to lie to me some fixed number of times L > 0.

Look at this answer on MathOverflow:

Yes, there is a way to guess a number asking 14 questions in worst case. To do it you need a linear code with length 14, dimension 10 and distance at least 3. One such code can be built based on Hamming code (see http://en.wikipedia.org/wiki/Hamming_code).

Here is the strategy.

Let us denote bits of first player's number as a_i, i ∈ [1..10]. We start with asking values of all those bits. That is we ask the following questions: "is it true that i-th bit of your number is zero?" Let us denote answers on those questions as b_i, i ∈ [1..10].

Now we ask 4 additional questions:

Is it true that a₁ ⊗ a₂ ⊗ a₄ ⊗ a₅ ⊗ a₇ ⊗ a₉ is equal to zero? (⊗ is sumation modulo 2).

Is it true that a₁ ⊗ a₃ ⊗ a₄ ⊗ a₆ ⊗ a₇ ⊗ a₁₀ is equal to zero?

Is it true that a₂ ⊗ a₃ ⊗ a₄ ⊗ a₈ ⊗ a₉ ⊗ a₁₀ is equal to zero?

Is it true that a₅ ⊗ a₆ ⊗ a₇ ⊗ a₈ ⊗ a₉ ⊗ a₁₀ is equal to zero?

Let q₁, q₂, q₃ and q₄ be answers on those additional questions. Now second player calculates t_i (i ∈ [1..4]) --- answers on those questions based on bits b_j which he previously got from first player.

Now there are 16 ways how bits q_i can differ from t_i. Let d_i = q_i ⊗ t_i (hence d_i = 1 iff q_i ≠ t_i ).

Let us make table of all possible errors and corresponding values of d_i: position of error -> (d₁, d₂, d₃, d₄)

no error -> (0, 0, 0, 0)

error in b₁ -> (1, 1, 0, 0)

error in b₂ -> (1, 0, 1, 0)

error in b₃ -> (0, 1, 1, 0)

error in b₄ -> (1, 1, 1, 0)

error in b₅ -> (1, 0, 0, 1)

error in b₆ -> (0, 1, 0, 1)

error in b₇ -> (1, 1, 0, 1)

error in b₈ -> (0, 0, 1, 1)

error in b₉ -> (1, 0, 1, 1)

error in b₁₀ -> (0, 1, 1, 1)

error in q₁ -> (1, 0, 0, 0)

error in q₂ -> (0, 1, 0, 0)

error in q₃ -> (0, 0, 1, 0)

error in q₄ -> (0, 0, 0, 1)

All the values of (d₁, d₂, d₃, d₄) are different. Hence we can find where were an error and hence find all a_i.

Answered by falagar.

Twenty questions against a liar

Related

Recent Posts