fair value of a hat-drawing game

I've been going through a problem solving book, and I'm a little stumped on the following question:

At each round, draw a number 1-100 out of a hat (and replace the number after you draw). You can play as many rounds as you want, and the last number you draw is the number of dollars you win, but each round costs an extra $1. What is a fair value to charge for entering this game?

One thought I had was to suppose I only have N rounds, instead of an unlimited number. (I'd then let N approach infinity.) Then my expected payoff at the Nth round is (Expected number I draw - N) = 50.5 - N. So if I draw a number d at the (N-1)th round, my current payoff would be d - (N-1), so I should redraw if d - (N-1) < 50.5 - N, i.e., if d < 49.5. So my expected payoff at the (N-1)th round is 49(50.5-N) + 1/100*[(50 - (N-1)) + (51 - (N-1)) + ... + (100 - (N-1))] = 62.995 - N (if I did my calculations correctly), and so on.

The problem is that this gets messy, so I think I'm doing something wrong. Any hints/suggestions to the right approach?


Solution 1:

Consider what happens in the first stage. You get a number out of the hat, and either stop or go on. And you do it in some "optimal" way (maximizes expectation). If you do go on, the rule that will ensure you the maximal expectation is the same rule you used before - your original optimal rule.

This shows that your stopping rule can be characterized by some threshold $X$, which is the minimal number for which you'd stop (alternatively we could have chosen $X-1$, which is the maximal number for which you'd go on). Denote the expected gain by $E$. We have $$ E = -1 + \frac{X-1}{100}E + \frac{\sum_{t=X}^{100} t}{100} = -1 + \frac{X-1}{100} + \frac{(100-X+1)(100+X)}{200}. $$ Multiplying by $200$ and rearranging, we get $$ 2(101-X)E = (101-X)(100+X) - 200. $$ Therefore $$ E = \frac{(101-X)(100+X) - 200}{2(101-X)}. $$ This is maximized by $101 - 10\sqrt{2} \approx 86.86$. Thus the best value for $X$ is either $86$ or $87$. These values give $E \approx 86.33$ and $E \approx 86.37$, so $X = 87$ is better, and the value of the game is $1209/14$.

My calculations don't agree with Ross's, so there's probably a mistake somewhere; this doesn't invalidate the method.

Solution 2:

Your expected return if you draw a number on the last round is 49.5 (because it costs a dollar to make the draw). On round N-1, you should keep what you have if it is greater than 49.5, or take your chances if it is less. The expected value if N=2 is then $\frac {51}{100}\frac {100+50}{2} -1 + \frac {49}{100}49.5=61.505$ where the first term is the chance that you keep the first draw times the expectation of that draw (assuming you will keep it), the second is the cost of the first draw, and the third is the chance that you will decline the first draw and take your chances on the second times the expectation of the second draw.

Added: As Yuval makes more explicit, your strategy will be to quit when you get a number at least $X$. The gain then is $\frac{100+X}{2}-\frac{100}{101-X}$ where the first is the payoff and the second is the cost of the expected number of plays. As he says, this is maximized at X=87 with value $\frac{1209}{14}=86.3571$. I'll have to think where I was a bit off.

Solution 3:

Edit: I've added some information about the game with a restricted number $N$ of rounds, since the OP mentioned this.


Just a comment to complement the nice answers we have already.

There is an algorithm that calculates the optimal strategy and the value of each state for such an optimal stopping problem.

Let $f(x)$ be the payout at each state $x\in {\cal S}$ for the underlying Markov chain $(X_n)$, and let $g(x)$ be the cost of leaving state $x$.

Set $u_0=f$ and for $N\geq 1$ put $u_N=\max(Pu_{N-1}-g,f)$. Here $P$ is the transition matrix of the Markov chain. Then $u_N$ is the value function for the restricted game with at most $N$ rounds; that is, $$u_N(x)=\sup_{T\leq N}\ \mathbb{E}\left[ f(X_T)\, | \, X_0=x\right],$$ where the supremum is over all stopping times $T$ that satisfy $T\leq N$.

As $N\to\infty$, we get $u_N\uparrow v$ where $v$ is the value function for the unrestricted game. The optimal strategy is to stop when the chain hits the set $\lbrace x: f(x)=v(x)\rbrace$. Here $v(x)$ means the value of state $x$, i.e., $v(x)$ is the maximum expected payout starting in state $x$ and using any finite stopping time $T$ as a strategy:

$$v(x)=\sup_T\ \mathbb{E}\left[ f(X_T)\, | \, X_0=x\right].$$

This is all nicely explained in the section on Optimal Stopping in Gregory Lawler's book "Introduction to Stochastic Processes".

In your problem we have $f(x)=x$ for $1\leq x\leq 100$, and $g(x)\equiv 1$. Your Markov chain $(X_n)$ is a sequence of independent, uniform $\lbrace 1,2,\dots, 100\rbrace $ random variables. Thus the $P$ operator applied to any function $h$ gives the constant function whose value is the average value of $h$, that is, $Ph(x)\equiv \sum_{i=1}^{100} h(i)/100$. So we calculate $$u_0(x)=x,\quad u_1(x)=\max\left(x,{99\over 2}\right), \quad u_2(x)=\max\left(x,{12301\over 200}\right). $$

Taking very large $N$ gives $$v(x)\approx \max\left(x,{86.35714}\right),$$ which shows that the optimal strategy (over all finite stopping times!) is to quit as soon as we get $87$ or higher, and the value of the game is $86.35714$ dollars.

In your problem it is pretty straightforward to calculate the exact answer, but this algorithm also gives the answer for more complicated games where exact calculations are not so easy.