You throw a fair coin one million times. What is the expected number of strings of 6 heads followed by 6 tails?

The answer given is:

There are $1,000,000 - 11$ possible slots for the sequence to occur. In each of these slots, the probability is $2^{-12}$. Due to linearity of expected value, the answer is therefore $(1,000,000 - 11)\times2^{-12}$.

I don't understand why this solution works. Shouldn't there be any consideration of the fact that at most only $\dfrac{1,000,000 - 11}{12} \approx 83332$ strings of 6 heads followed by 6 tails can occur. Can anyone please help me understand the solution?


Solution 1:

Maybe it helps to do a simpler example, so let's do $3$ coin flips, and let's look at the expected number of times you get $1$ head followed by $1$ tail.

Now, there are $8$ possible outcomes:

$TTT$

$TTH$

$THT$

$THH$

$HTT$

$HTH$

$HHT$

$HHH$

Note that $HT$ occurs $4$ times, so the expected number of $HT$'s to occur is $\frac{4}{8}=\frac{1}{2}$

Now, notice that there are two 'slots' for an $HT$ to appear: the first two flips, or the second and third flip.

Also note that $HT$ occurs two times for the first two slots, so the expected number of $HT$'s for the first slot is $\frac{2}{8}=\frac{1}{4}$

And the same is true for the second slot. That is, the expected number of $HT$'s for the second slot is also $\frac{1}{4}$

And now note that $\frac{1}{4} + \frac{1}{4} = \frac{1}{2}$. That is: the expected number of $HT$'s occuring anywhere in the sequence is the expected number of $HT$'s to occur in the first slot added to the expected number of $HT$'s in the second slot. ... And of course this should be as such: we have $4$ $HT$'s total out of the $8$ equally likely strings, and that $4$ is the sum of $2$ and $2$. This is what they mean by the 'linearity of expected value'

So, the fact that you can have at most $1$ $HT$ in a sequence of $3$ is irrelevant.

The same thing happens in your problem: out of all possible outcomes, some of the $HHHHHHTTTTTT$ strings will occur in the first slot (throws $1$ though $12$), some will occur in the second slot (throws $2$ through $13$), etc. And in the end, you just end up adding all those. Again, the fact that you cannot have a $HHHHHHTTTTTT$ in the first and in the second slot at the same time is again irrelevant.

Solution 2:

Let's work with smaller numbers so I can type all the possibilities. Say you flip a coin $7$ times and you're asked about the number of times you get a string of $(H,T,H)$. Well, the following outcomes are what you would be on the lookout for:

  1. $(H,T,H, *,*,*,*)$
  2. $(*,H,T,H, *,*,*)$
  3. $(*,*,H,T,H, *,*)$
  4. $(*,*,*,H,T,H, *)$
  5. $(*,*,*,*,H,T,H)$

where the $*$ indicates either of $H$ or $T$, but we don't really care for the moment.

These are the only ways you could get the specified string $(H,T,H)$ out of $7$ flips. Notice there are $7-3+1=5$ ways. So, here we're making the (not so deep) observation that an occurrence of $(H,T,H)$ within a flip of $7$ coins is fully characterized by when it begins.

Note that we don't want to do something like $7/3=2+\frac{1}{3}\approx 2$ and claim that there are only two positions where we can get $(H,T,H)$. When you do such a division, you're only taking into account occurrences of $(H,T,H)$ which are "disjoint" (i.e options (1), (5) above). For example, $(H,T,H,*,H,T,H)$ is an example of what (I think) you might be thinking of, but you're missing out on several others.

Just to be concrete, here is a possible outcome $\omega=(H,T,H,T,H,T,H)$. The number of occurrences of the string $(H,T,H)$ is 3 here (we have options (1), (3), (5) happening here).

Consider the random variable $N$ which describes how many times a string of $(H,T,H)$ is found in a flip of seven coins. Then \begin{align} N=\sum_{i=1}^5\mathbf{1}_{\{\omega\,: (\omega_i,\omega_{i+1},\omega_{i+2})=(H,T,H)\}} \end{align} Here $\mathbf{1}_A$ means the indicator function of the set $A$. The above sum just adds a $1$ if we find $(H,T,H)$ in the $i,i+1,i+2$ slots, and it does so for each $1\leq i\leq 5$, so it counts all the times it sees a string of $(H,T,H)$. By linearity of expectation (or integrals) it follows that \begin{align} \Bbb{E}(N)=\sum_{i=1}^5\Bbb{E}\left(\mathbf{1}_{\{\omega\,: (\omega_i,\omega_{i+1},\omega_{i+2})=(H,T,H)\}}\right)= 5\cdot 2^{-3} \end{align} In your problem, you just have to replace the number $7$ by $10^6$ (a million) and $3$ (the length of the string) by $12$.

Now, as a bonus, try to figure out the general formula for when you do $n$ coin flips, and you're looking for expected value of the number of times a certain string of length $k$ (where $1\leq k\leq n$) occurs.