Understanding The Math Behind Elchanan Mossel’s Dice Paradox

So earlier today I came across Elchanan Mossel's Dice Paradox, and I am having some trouble understanding the solution. The question is as follows:

You throw a fair six-sided die until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers?

Quoted from Jimmy Jin in "Elchanan Mossel’s dice problem"

In the paper it goes on to state why a common wrong answer is $3$. Then afterwards explains that this problem has the same answer to,

"What is the expected number of times you can roll only $2$’s or $4$’s until you roll any other number?"

I don't understand why this is the case. If the original problem is asking for specifically a $6$, shouldn't that limit many of the possible sequences?

I also attempted to solve the problem using another method, but got an answer different from both $3$ and the correct answer of $1.5$.

I saw that possible sequences could have been something like:

$$\{6\}$$ $$\{2,6\}, \{4,6\}$$ $$\{2,2,6\}, \{2,4,6\}, \{4,2,6\}, \{4,4,6\}$$ $$\vdots$$

To which I set up the following summation and solved using Wolfram Alpha:

$$\text{Expected Value} =\sum_{n=1}^\infty n\left( {\frac{1}{6}} \right)^n 2^{n-1} = 0.375$$ Obviously this is different and probably incorrect, but I can't figure out where the error in the thought process is.

Any help on understanding this would be greatly appreciated.

A blog post discussing the problem can be found here.


When you roll a die until $6$ appears, you can represent the sample space as all possible finite sequences from the set $\{1, 2, 3, 4, 5, 6\}$ ending in $6$, with probability of any sequence of length $k$ being $(1/6)^k$. The original question is asking for

$(1)$ the expected length of a sequence conditional on all throws being even.

You've correctly enumerated all sequences from $\{2,4,6\}$ that end in $6$, and calculated the sum $$\sum_{n=1}^\infty n\left( {\frac{1}{6}} \right)^n 2^{n-1} = 0.375$$ properly, but you forgot to divide this by the probability of the event you are conditioning on, which is $$ \sum_{n=1}^\infty\left(\frac16\right)^n2^{n-1}=1/4. $$ So your approach does yield the correct answer, namely $4\times 0.375=1.5$.

The act of conditioning on all throws being even is tantamount to restricting the sample space to all possible finite sequences from the set $\{2, 4, 6\}$ that end in $6$, and rescaling the probability function (by a factor $4$) so that this new sample space has total mass $1$.

As for the Jin paper, he claims that the original question $(1)$ is equivalent to

$(2)$ the expected number of times you can roll only $2$'s or $4$'s until you roll a $6$.

I disagree with $(2)$; it is incorrect to compute an unconditional expectation, as he just explained in his previous paragraph. He still needs an expectation conditional on some event, and I would argue the original question $(1)$ is equivalent to computing

$(2')$ the expected number of times you can roll only $2$'s or $4$'s until you roll any other number, given that the other number is $6$.

The reason is that conditioning on the event "the other number is $6$" results in the same restricted sample space as before. In fact his subsequent argument that it suffices to compute the unconditional expectation

$(3)$ the expected number of times you can roll only $2$'s or $4$'s until you roll any other number.

(i.e., that $(2') = (3)$, which is what he actually proves) is relevant only if we intend $(2')$ instead of $(2)$.


EDIT: Here's a Python simulation of the experiment, based on code provided by @thecoder:

import random

times = 0 #number of times a successful (all-even) sequence was rolled
rolls = 0 #total of all number of rolls it took to get a 6, on successful sequences
curr = 0
alleven = True

for x in range(0, 100000):

  num = random.randint(1,6)
  if num % 2 != 0:
    alleven = False
  else:
    if num == 6:
      if alleven:
        times += 1
        rolls += curr + 1
      curr = 0
      alleven = True
    else:
      curr += 1

print(rolls * 1.0 / times)
#1.51506456241

Part of Jimmy Jin's explanation is right, but his account of "the subtraction fallacy" is unfortunate. Conditioning does consist of discarding part of the sample space.

Suppose one throws a die repeatedly and gets this sequence: $$ 2, 5, 3, 2, 1, 4, 3, 6 $$ Next suppose one excludes the odd numbers: $$ \require{cancel} 2, \xcancel5, \xcancel3, 2, \xcancel1, 4, \xcancel3, 6 $$ One then has this sequence: $$ 2,2,4,6 $$ The average length of this sequence will be $3.$

But now suppose you get this sequence: $$ 2, 5, 3, 2, 1, 4, 3, 6 $$ Then try again and get this: $$ 3,2,5,2,1,6 $$ Then try again and get this: $$ 2,4,4,2,6 $$ Then discard the first two since one got odd numbers before getting a $6:$ $$ \xcancel{2, 5, 3, 2, 1, 4, 3, 6} $$ $$ \xcancel{3,2,5,2,1,6} $$ But keep that last one: $$ 2,4,4,2,6 $$ Among the ones you keep the average length is $3/2,$ not $3.$

You are in fact excluding that part of the sample space in which at least one odd number appears before the first $6,$ and in what's left of the sample space the average position of the first $6$ is $3/2.$ Thus Jimmy Jin is wrong about the "subtraction fallacy."

However, he is right that the average position of the first $6,$ given that only even numbers appear, is $3/2.$

In sequences pared down by excluding all the odd numbers that appear, the average position of the first $6$ is $3.$