Twenty questions against a liar
Here's one that popped into my mind when I was thinking about binary search.
I'm thinking of an integer between 1 and n. You have to guess my number. You win as soon as you guess the correct number. If your guess is not correct, I'll give you a hint by saying "too high" or "too low". What's your best strategy?
This is an easy problem if I always tell the truth: by guessing the (rounded) mean of the lower bound and the upper bound, you can find my number in roughly log2n guesses at most.
But what if I'm allowed to cheat once? What is your best strategy then? To clarify: if you guess my number, you always win instantly. But I'm allowed, at most once, to tell you "too high" when your guess is actually too low, or the opposite. I can also decide not to lie.
Here's a rough upper bound: you can ask each number twice to make sure I'm not cheating, and if I ever give two different answers, just ask a third time and from then on, play the regular game. In this way, you can win with about 2 log2n guesses at most.
I'm pretty sure that bound can be improved. Any ideas?
There is a fairly simple strategy which requires (1 + ε)log(n) + 1/ε + O(1) queries, for any constant ε > 0. I illustrate this strategy below.
First, I ask you whether or not X, the secret number, is n/2. Without loss of generality, suppose you answer "less". I learn nothing at first, because you may be lying; but for the moment I give you the benefit of the doubt.
I next ask you whether X is n/4. If you say "less", I don't know whether or not 0 < X < n/4, but I do know that 0 < X < n/2, because you can't lie twice. Similarly, if I go about a normal binary search, so long as you continue to answer "less", I know that your answer to the preceding query was honest. So we may reduce to the case where you say "more" to my query of whether X is n/4.
If I continue to take you at your word, persue a binary search, and enquire about whether X is 3n/8, you may say either "more" or "less". If you say "more", then I don't know whether X > n/2, but I do know that X > n/4, again because you can't lie twice. So again so long as you continue to answer "more" in my normal binary search, I know that your answer to the preceding query was honest.
More generally: if I consistently guess under the hypothesis that you are being honest, in any "monotonic" sequence of responses, I know that all but (possibly) the last of them are honest. So it might seem as though the worst case scenario is where your responses alternate a lot, as would occur for instance if you had chosen something like X = ⌊ n/3 ⌋. But in the alternating case too, I can be confident of the honesty of some of your answers:
- If you say (non-monotonically) that X is less than 3n/8, I don't know whether X is less than 3n/8 for sure; but I do know that X < n/2, because again you can't have lied about both.
More generally: not only do I know that monotonic subsequences of answers are mostly honest, I know that any time that you answer "more" or "less", your previous answer of the same kind was also honest. So in fact I should be most suspicious, when I encounter long monotonic subsequences in your answers, that the answer previous to that monotonic subsequence was a lie.
What I need is a strategy that will tell me when to revisit an old question, depending on how large n is. Ideally, revisiting old questions would require very little overhead. If I encounter a monotonic sequence of f(n) responses, I revisit the last question before that monotonic sequence started.
- For instance, if your responses to queries about n/4, 3n/8, 7n/16, etc. are all monotonically "more", I eventually ask about the last time you said "less", which is for n/2, just in case you lied back then. This allows me to avoid the scenario where you lie about n/2, and keep me guessing at (2t−1 − 1)/2t until I eliminate all possibilities, catch you in your lie, and then redo all of my binary search for queries greater than n/2.
If I do a double-check like this every time I encounter a monotonic sequence of length r, I will in the worst case query you about (r+1)log(n)/r = (1 + 1/r)log(n) times. If I catch you in a lie, I will have only "wasted" r queries, and my strategy afterwards can be just a simple binary search without double-checks; so your optimal strategy for maximizing the number of queries I make is actually not to lie at all, or to save your lie for nearly the end of the game to cost me about r additional queries. Here r can be an arbitrarily large integer; thus for any ε, I can achieve a query rate of (1 + ε)log(n) + 1/ε + O(1) by setting r > 1/ε.
Bonus problem #1. Without too much extra work, I think you can improve this to a strategy requiring only log(n) + O(log log(n)) queries, but I'm too lazy to work out the details right now.
Bonus problem #2. Generalize this strategy to the regime where you are allowed to lie to me some fixed number of times L > 0.
Look at this answer on MathOverflow:
Yes, there is a way to guess a number asking 14 questions in worst case. To do it you need a linear code with length 14, dimension 10 and distance at least 3. One such code can be built based on Hamming code (see http://en.wikipedia.org/wiki/Hamming_code).
Here is the strategy.
Let us denote bits of first player's number as ai, i ∈ [1..10]. We start with asking values of all those bits. That is we ask the following questions: "is it true that i-th bit of your number is zero?" Let us denote answers on those questions as bi, i ∈ [1..10].
Now we ask 4 additional questions:
Is it true that a1 ⊗ a2 ⊗ a4 ⊗ a5 ⊗ a7 ⊗ a9 is equal to zero? (⊗ is sumation modulo 2).
Is it true that a1 ⊗ a3 ⊗ a4 ⊗ a6 ⊗ a7 ⊗ a10 is equal to zero?
Is it true that a2 ⊗ a3 ⊗ a4 ⊗ a8 ⊗ a9 ⊗ a10 is equal to zero?
Is it true that a5 ⊗ a6 ⊗ a7 ⊗ a8 ⊗ a9 ⊗ a10 is equal to zero?
Let q1, q2, q3 and q4 be answers on those additional questions. Now second player calculates ti (i ∈ [1..4]) --- answers on those questions based on bits bj which he previously got from first player.
Now there are 16 ways how bits qi can differ from ti. Let di = qi ⊗ ti (hence di = 1 iff qi ≠ ti ).
Let us make table of all possible errors and corresponding values of di: position of error -> (d1, d2, d3, d4)
no error -> (0, 0, 0, 0)
error in b1 -> (1, 1, 0, 0)
error in b2 -> (1, 0, 1, 0)
error in b3 -> (0, 1, 1, 0)
error in b4 -> (1, 1, 1, 0)
error in b5 -> (1, 0, 0, 1)
error in b6 -> (0, 1, 0, 1)
error in b7 -> (1, 1, 0, 1)
error in b8 -> (0, 0, 1, 1)
error in b9 -> (1, 0, 1, 1)
error in b10 -> (0, 1, 1, 1)
error in q1 -> (1, 0, 0, 0)
error in q2 -> (0, 1, 0, 0)
error in q3 -> (0, 0, 1, 0)
error in q4 -> (0, 0, 0, 1)
All the values of (d1, d2, d3, d4) are different. Hence we can find where were an error and hence find all ai.
Answered by falagar.