How would you go about learning the combination of coins the man has?

I am trying to identify the branch of math that would help to solve the following problem:

A man has picked $10$ coins out of a bag and has laid them in a row. You cannot see them for yourself, and the heads/tails ratio is unknown. Your goal is to identify the correct heads/tails sequence of the coins by repeatedly picking $10$ coins of your own from the bag, with the man telling you how many match the position of his each time. You cannot decide whether the coins are heads or tails (for the sake of the problem assume they are the same on both sides) and your sequence depends on the order you remove them from the bag. Note that the man only selects his coins once and his never change.

For example, if the man has $HTTHHHHTTT$ and you propose $TTTHTHTHHT$ he will let you know that $5$ are a match, however he will not tell you which ones. If on your second try you pull out $THHTTTHTHT$ he will inform you that $3$ match. Without trying every possible combination $(2^{10})$, how would you go about learning the combination of coins the man has?

Comparing the two sample results I see that they share $3$ in common, but I'm not sure if or how that is useful information since I don't know which ones actually match.

I suspect it might require statistics/combinatorics, and I was told it could be solved with "either high school or college math". This isn't a homework problem so it's possible I'm just being trolled by the person. Any information you can provide for approaching this type of problem is much appreciated. Thank you.


The problem described in the Question, to determine exactly the opponent's fixed sequence of heads and tails, is not fully formulated because it needs some probability distribution for the sample sequences drawn by the player.

Given that probability distribution, one can ask the expected number of trials until the hidden sequence is determined. For some (trivial) distributions it will be impossible to make that determination, e.g. if all the drawn sequences are tails only, then all that can be known is what the split of the ten coins is between heads and tails.

Of a more combinatorial nature is how to extract information from the samples that are drawn. One way to think about it is in terms of the set of possibilities for the hidden sequence. In this connection having zero matches (or ten matches) would be ideal, because the exact sequence is uniquely determined.

In general getting $k$ matches will imply that there are $\binom{10}{k}$ possible solutions. Once the intersection of those subsets of possible solutions has a single common sequence, the problem is solved.

Let's consider how much information can be extracted from the example given in the Question. The first sample:

TTTHTHTHHT

gives five matches, and the second sample:

THHTTTHTHT

has only three matches.

Therefore in looking at the positions where there was a change:

T***T***HT

we know that two matches were lost (two that matched in the first sample but did not match in the second sample). So at least two of the five matches in the first sample are in those six "*" (changed) positions. Similarly at most three of the five matches in the first sample are in the four unchanged positions.

But more can be deduced from the net change of $-2$ matches from the first sample to the second sample. If the first sample has $k$ matches in the six changed positions, then the second sample would have $6-k$ matches there (since all have changed, and there are only heads or tails). So that would give us a net change of $-k + (6-k)$, and in order for this to be $-2$, we require $k=4$.

This reduces the number of possible solutions accordingly. The first sample alone gives $\binom{10}{5} = 252$ solutions. The second sample alone gives $\binom{10}{3} = 120$ solutions. Combining them gives us just $\binom{6}{4} \cdot \binom{4}{1} = 60$ possible solutions (pick four of the six changed positions for matches in the first sample, and one of the four unchanged positions for the final fifth match in the first sample).


Here's a Python program that implements the algorithm mentioned by saulspatz in the question comments, and discussed in hardmath's answer. I assume that the probability of picking a heads is 0.5.

In each round of the game, we first create a pool containing all 1024 possible 10 bit strings. Next, the target bit string is generated. Then we enter the outer loop, where we generate a guess bit string, and count the number of "bulls" (positions where the target & guess bit strings match). We then loop over all the bit strings in the pool, retaining only the strings that have the same match count when tested against the current guess. We exit the outer loop when only one bit string remains in the pool, which must match the target.

""" Random binary string guessing game
    See https://math.stackexchange.com/q/4064108
    Written by PM 2Ring 2021-03-17
"""

from random import seed, getrandbits
seed(42)

def bits(n):
    " Convert int n to a bit string "
    return f"{n:010b}"

def mcount(a, b):
    " Count matches between sequences a & b "
    return sum(u==v for u,v in zip(a, b))

def test(verbose=True):
    " Play 1 round of the game "
    pool = [bits(u) for u in range(1<<10)]
    target = bits(getrandbits(10))
    if verbose:
        print("\n T", target)
    for i in range(1, 100):
        guess = bits(getrandbits(10))
        bulls = mcount(target, guess)
        pool = [u for u in pool if mcount(u, guess) == bulls]
        if verbose:
            print(f"{i:2}", guess, bulls, len(pool))
        if len(pool) == 1:
            break
    if verbose:
        print(" R", pool[0], pool[0] == target)
    return i

@interact
def main(runs=10, verbose=True, auto_update=False):
    k = 0
    for _ in range(runs):
        k += test(verbose)
    print("Mean:", float(k / runs))

Here's the output for 10 runs, using 42 as the randomizer seed.

In each run, the target bit string is printed, followed by a number of guesses. Each guess line consists of its index, the guess bit string, the number of bits that match the target, and the number of remaining bit strings in the pool that are consistent with this guess. Finally, the only string remaining in the pool is printed, and we verify that it's identical to the target.

At the end of all runs, we print the mean number of guesses required per run.

 T 1010001110
 1 0001110010 3 120
 2 0000011001 4 50
 3 1011110111 5 26
 4 0100011001 3 4
 5 0011111010 5 3
 6 0011100100 5 1
 R 1010001110 True

 T 0010001110
 1 1011110010 4 210
 2 0001101000 5 100
 3 1010110100 5 48
 4 1011110110 5 30
 5 1110010001 3 12
 6 1000101110 7 9
 7 0001011001 4 5
 8 1001011100 5 3
 9 0110110000 4 2
10 0000100000 5 1
 R 0010001110 True

 T 0000011110
 1 0001011111 8 45
 2 0011011111 7 36
 3 0011101110 6 18
 4 1000000101 5 10
 5 1001101000 4 5
 6 0000011011 8 2
 7 1000111110 8 2
 8 0011001011 5 2
 9 1011011101 5 2
10 1010011001 5 2
11 1011001110 6 1
 R 0000011110 True

 T 1000101110
 1 0110101101 5 252
 2 0011100001 3 60
 3 0111001011 3 18
 4 1001011011 5 10
 5 0100011100 5 3
 6 1100111100 7 3
 7 1101111010 6 3
 8 0000000110 7 1
 R 1000101110 True

 T 1100001001
 1 1100111001 8 45
 2 0010100011 4 24
 3 1011001010 5 12
 4 0110110000 4 6
 5 0101011100 5 6
 6 0100011100 6 3
 7 0010011111 4 1
 R 1100001001 True

 T 0011011100
 1 1111010100 7 120
 2 1100001101 4 50
 3 0101011000 7 15
 4 0001101000 6 9
 5 0001011110 8 2
 6 0110000101 5 2
 7 0001100011 3 1
 R 0011011100 True

 T 0101101111
 1 1101100011 7 120
 2 0101100000 6 63
 3 1001101010 6 30
 4 0100001110 7 6
 5 1100111010 5 3
 6 0000101100 6 1
 R 0101101111 True

 T 1011101011
 1 0111010110 3 120
 2 1000100101 5 56
 3 0001111111 6 24
 4 1111100100 5 9
 5 1110110000 4 6
 6 0110000011 5 4
 7 0001010000 3 2
 8 1000110101 4 2
 9 0100101100 3 1
 R 1011101011 True

 T 1101010001
 1 1010000011 5 252
 2 1001111001 7 60
 3 1110001010 4 30
 4 1101110010 7 8
 5 0101110010 6 8
 6 1001001111 5 4
 7 0011000100 4 2
 8 1011010001 8 1
 R 1101010001 True

 T 0001000111
 1 0000101110 6 210
 2 1010100101 5 100
 3 0011101001 5 46
 4 1100010111 6 18
 5 0100101000 3 5
 6 1111110000 2 3
 7 0001010001 7 1
 R 0001000111 True
Mean: 7.9

It appears that the mean number of guesses required is a little over 7.

Here is a live version of the program, running on the SageMathCell server. If you set the runs parameter to 10000, it will take about one minute in non-verbose mode. I haven't tested higher numbers, but if a computation takes too long the server will abort the job.