Probability that THHT occurs in a sequence of 10 coin tosses
Assume we have a fair coin, and we throw the coin $10$ times in a row.
I want to calculate the probability that the sequence 'tail, head, head, tail' occurs.
So I think I can interpret this event as a binary number with $10$ digits. So $1$ means tail, $0$ means head. Therefore we have $2^{10} = 1024$ different outcomes of the $10$ throws. The sequence 'tail, head, head, tail' can start at $7$ different positions and so there are $7\cdot2^6 = 448$ different outcomes of the $10$ throws with the sequence 'tail, head, head, tail'. So the probability would be $\frac{448}{1024} = 0.4375$.
But I have a feeling there's something wrong?
Solution 1:
You can use the Inclusion-exclusion principle to solve this. When I do this I get $$7\cdot 2^6 - 6\cdot 2^2 - 4\cdot 2^3+1$$ where the first term is what you get, where the second and third terms count the number of sequences with two non-overlapping instances of T H H T and of sequences with one overlap, like T H H T H H T, and finally the number of sequences with a triple overlap, T H H T H H T H H T.
Confession: I has earlier got $45\cdot2^2$ for the second term, by a mental blunder, as AnnaSaabel pointed out. There are 2 "gaps" to separate the two instances of THHT, which can occur before, between, or after the 2 instances; they can be distributed in any of the 6 ways 200,020,002,110,101, or 011.
Added: if the number of coin tosses were $n=100$ (say), and the pattern sought was still THHT, this method becomes clumsy. A different method is to construct a Markov chain with states representing how far a string matching algorithm has progressed in matching the given pattern. If $M$ is the transition matrix for this chain, the desired answer is the entry in the matrix $M^n$ corresponing to the pair $(\text{start state}, \text{accepting state})$.
Solution 2:
As @kimchilover states in the comments, you are counting some 10-digit binary numbers more than once in the number $7\cdot 2^6$. To make this more obvious, consider a different problem: to find the probability that the sequence 'heads' appears. By your counting logic, there are 10 places for it to begin, so there are $10\cdot 2^9$ different outcomes of the 10 throws with the sequence 'heads', so the probability would be $\frac{10\cdot 2^9}{2^{10}} = 5$. That can't be good. It's very clear now that the issue is overcounting -- you have five times as many sequences with 'heads' in them as the number of sequences total! The problem is that we have counted sequences with multiple heads many times. For example, the sequence of all heads is counted $10$ times, once for each of the places where the sequence 'heads' begins within it.
As I write this, I see that @kimchilover also just posted an answer to the question which directs you to the inclusion-exclusion principle, so I'll stop here with an answer which could just help you to try generalizing arguments which feel fishy to see where they go wrong. Good job detecting the fishiness!