Why is this coin-flipping probability problem unsolved?

You play a game flipping a fair coin. You may stop after any trial, at which point you are paid in dollars the percentage of heads flipped. So if on the first trial you flip a head, you should stop and earn \$100 because you have 100% heads. If you flip a tail then a head, you could either stop and earn \$50, or continue on, hoping the ratio will exceed 1/2. This second strategy is superior.

A paper by Medina and Zeilberger (arXiv:0907.0032v2 [math.PR]) says that it is an unsolved problem to determine if it is better to continue or stop after you have flipped 5 heads in 8 trials: accept \$62.50 or hope for more. It is easy to simulate this problem and it is clear from even limited experimental data that it is better to continue (perhaps more than 70% chance you'll improve over \$62.50):
alt text
My question is basically: Why is this difficult to prove? Presumably it is not that difficult to write out an expression for the expectation of exceeding 5/8 in terms of the cumulative binomial distribution.


(5 Dec 2013). A paper on this topic was just published:

Olle Häggström, Johan Wästlund. "Rigorous computer analysis of the Chow-Robbins game." (pre-journal arXiv link). The American Mathematical Monthly, Vol. 120, No. 10, December 2013. (Jstor link). From the Abstract:

"In particular, we confirm that with 5 heads and 3 tails, stopping is optimal."


I accept Qiaochu's answer "Have you tried actually doing that?" I did try now, and now I can appreciate the challenge. :-) The paper I cited refers to another by Chow and Robbins from 1965 that has a beautiful formulation, much clearer than the cummulative binomials with which I struggled. Let me explain it, because it is really cool.

For the natural strategy I mentioned in the comments (and echoed by Raynos), let $f(n,h,t)$ be the expected payoff if you start with $h$ heads and $t$ tails, and let the game continue no more than $n$ further trials. Then there is an easy recursive formulation for $f$: $$f(n,h,t) = \max \left( \frac{1}{2} f(n,h+1,t) + \frac{1}{2} f(n,h,t+1) , h/(h+t) \right) $$ because you have an equal chance of increasing to $h+1$ heads or to $t+1$ tails on the next flip if you continue, and you get the current ratio if you stop. Now, when $h+t=n$, you need to make a "boundary" assumption. Assuming the law of large numbers for large $n$ leads to the reasonable value $\max ( 1/2, h/n )$ in this case. So now all you need to do is fill out the table for all $h$ and $t$ values up to $n=h+t$. The real answer is the limit when $n \rightarrow \infty$, but using large $n$ approximates this limit.

After the Medina and Zeilberger paper was released, in fact just about three weeks ago, a very careful calculation using the above recursive formulation was made by Julian Wiseman and reported on this web page. The conclusion is to me remarkable: "Choosing to continue lowers the expected payoff [from 0.625] to 0.62361957757." This is still not a proof, but the "answer" is now known. So my "it is clear from even limited experimental data that" was completely wrong! I am delighted to learn from my mistake.


This seems to be related to Gittins Indices. Gittins Indices are a way of solving these kind of optimal stopping problems for some classes of problems, and basically give you a way of balancing how much you are expected to gain given your current knowledge and how much more you could gain by risking obtaining more information about the process (or probability of flipping heads, etc).

Bruno