Convergence of winning probability in a one-player dice-throwing game

In this (one-player) game, the player starts with a total of $n$ points. On each turn, they choose to throw either a four-, six-, or eight-sided die, and then subtract the number thrown from their point total. The game continues until the player's point total reaches zero exactly, and they win, or falls below zero, in which case they lose.

The strategy does not seem obvious. (Even for the simple case of $n=5$, one must do a calculation.) And dynamic programming techniques show that the strategy is not straightforward. For $n≤20$, the four-sided die is optimal, except for $n=5, 6, 13, 20$, where the six-sided die is optimal, and $n=8, 14, 16$, where the eight-sided die is optimal.

For large $n$, the probability of winning approaches $0.456615178766744$, which is not a number that I recognize. (The ISC does not recognize it either.)

It's not particularly surprising that the probability of winning converges as $n$ increases, because the probability of winning with $n$ points is the mean of probability of winning with $n-i$ points, taken over a small range of $i$. Considering the probabilities as a sequence, each element lies inside the range of the previous few elements, and tends to be in the middle of that range. As one goes farther out in the sequence, variations away from the mean tend to be damped out.

My questions are:

  • What's this $0.456615178766744$? (For a single $d$-sided die the probability of winning approaches $\frac 2{d+1}$, but for more dice the problem seems harder.)
  • Is there any regularity to the optimal move, as $n$ increases?
  • Is there any good way estimate a good strategy, short of exhaustive computer calculations? For example, in the $n=7$ situation, the four-sided die is significantly better than the six-sided die. Is there some way to see this, or at least to guess that it is so?
  • Someone must have studied this before. Does the problem have a name? Can someone give me a reference to the literature?

Solution 1:

Others have computed lots of empirical results for this thing (as indeed have I) but it would be nice to have some theorems! So here's an analysis of three families of simple cases for which we can (1) determine the optimal sequence of die-choices, (2) explicitly compute the resulting win probabilities (in two of the cases) and their limit (in all three), and (3) show that the limit is rational. Unfortunately this family doesn't include the specific one the original question was about :-).


Obviously we can consider this question with any set of dice. (For that matter, we can consider it with any set of weighted-average operations of finite support, but the case with dice seems hard enough already.)

I make the following claims.

Zeroth: the sequence of win-probabilities tends to a limit; if the sequence of die choices is eventually periodic, then the limit of the sequence of win probabilities is rational.

First: in the case $\{k\}$ -- that is, just one $k$-sided die and nothing more -- there are no choices to make and the limit is $2/(k+1)$. I'm not sure whether there's a nice formula for the individual win probabilities.

Second: in the case $\{2,k\}$ -- that is, one $k$-sided die and one 2-sided die, with $k>2$ -- the sequence of optimal choices repeats with period $k$: when the number of points remaining is a multiple of $k$ one should use the $k$-sided die, and otherwise the 2-sided one. Below I give formulae for the actual win probabilities; their limit turns out to be

$$\frac6{9-\frac2k+\frac3{(-2)^k-1}}.$$

Third: in the case $\{k,k+1,\dots,2k-1\}$, the sequence of optimal choices consists of $k-1$ instances of $k$ followed by $k,k+1,\dots,2k-1$, after which you can just pick $k$ every time; the sequence of win probabilities is constant from index $k$ onward. The constant is $\frac{(k+1)^{k-1}}{k^k}$.


As remarked by Rob Pratt, the neatest way to formulate this question is as a recurrence relation for the probabilities: writing $V(n)$ for the probability of success when starting with $n$ points, we have $V(n)=0$ for negative n, $V(n)=1$ for $n=0$, and otherwise $V(n)$ is the largest of the averages of the previous $d$ values, where $d$ ranges over the available dice. (The parameter $k$, the number of faces on the larger die, is left implicit.)

As a warm-up, let's deal with the easiest case (this is the first of the theorems promised above): just one die. In this case there are no choices, and we are just dealing with a linear recurrence relation: $V(n)=\frac1k\sum_{0<j\leq k}V(n-j)$, with $V(n)=0$ for negative $n$ and $V(0)=1$. Consider $U(n)=\sum_{1\leq j<\leq k}jV(n+j)$. We have $U(-k)=k$ because the only nonzero term is $k\cdot1$. We also have

$$\begin{eqnarray} U(n)-U(n-1)&=&\sum_{1\leq j<\leq k}jV(n+j)-\sum_{1\leq j<\leq k}jV(n+j-1) \\ &=&\sum_{1\leq j<\leq k}jV(n+j)-\sum_{0\leq j<\leq k-1}(j+1)V(n+j) \\ &=&kV(n+k)-\sum_{0\leq j<k}V(n+j) \\ &=&0, \end{eqnarray}$$

so $U$ is a constant sequence. In particular its limit is $k$; but this limit is also $1+\cdots+k$ times the limit of sequence $V$, which therefore equals $\frac{k}{1+\cdots+k}=\frac2{k+1}$.

This is also a good place to prove the zeroth theorem above, because its proof reuses the (only) idea in the proof just given.

The first part of the zeroth theorem says that the sequence of win probabilities tends to a limit. This is kinda obvious, because we are performing a sequence of averaging operations, but proving it seems trickier than I expected. (But maybe I'm missing something simple.) Let $d$ be the number of faces on the largest die that we use infinitely often; once we have stopped using any larger ones that we use only finitely often, to get from one vector of $d$ consecutive win-probabilities to the next we append a new component equal to the average of some of those $d$ win-probabilities (infinitely often, the average of all of them) and discard the first component. This means that the smallest of $d$ consecutive win-probabilities is nondecreasing, and the largest is nonincreasing; so both tend to limits; can these limits be different? Suppose that their limits are $L,U$ respectively, with $L<U$. Suppose we're far enough along the sequence that no value is outside the interval $[L-\epsilon,U+\epsilon$. (We'll choose a suitable value of $\epsilon$ shortly.) At some point we must see a value no bigger than $L+\epsilon$. The value after that is no bigger than $\frac{(d-1)(U+\epsilon)+L+\epsilon}{d}=\frac{(d-1)U+L}d+\epsilon$; by induction subsequent values are also no bigger than this. If we choose $\epsilon<\frac{U-L}{2d}$ then this means that after seeing a value close to $L$ we never again see a value bigger than $U-\frac{U-L}{2d}$, which means that $U$ is not after all the limit of the largest win probability in a consecutive sequence of $d$ values. So the assumption $L<U$ leads to a contradiction, and in fact those limits are the same, which means that the sequence as a whole tends to a limit.

The other half of the zeroth theorem: When the sequence of die-choices is periodic, the limiting win probability is always rational. Why? Well, letting $d$ be the number of faces on the largest die, once periodicity is attained (say, with period $p$) there is some $d$-by-$d$ matrix $M$ such that when $v$ is a (column) vector of $d$ consecutive win probabilities, $Mv$ is the vector of win probabilities for starting numbers $p$ higher; so we are interested in the limit of $M^nv$ where $v$ is the specific column vector once periodicity is reached. The components of $M$ and $v$ are obviously all rational. Now, we have $Mu=u$ where $u$ is a vector all of whose components are equal, because $M$ is built out of averaging operations. So $(M-1)$ is singular, where $1$ denotes the identity matrix; so there's at least one vector in its left nullspace: a (row) vector $w$ such that $wM=w$. We then have $wv=wM^nv\rightarrow wl$, where $l$ is the (column) vector all of whose elements are the limiting win-probability. In particular, if the components of $w$ are rational and their sum is nonzero then this implies that the limiting probability is rational.

There are familiar algorithms for finding the nullspace of a matrix, using only arithmetic operations; in particular, if the matrix's components are all rational then the nullspace has a basis of vectors all of whose components are rational. So we're done provided we can find a suitable $w$ whose components don't add up to 0.

If we can't, this means that everything in the left nullspace of $M-1$ is orthogonal to the column vector $c$ all of whose components are 1. The orthogonal complement of the left nullspace is precisely the image (i.e., the span of the columns), so this would mean that $c$ is in the image; that is, for some vector $u$ we have $Mu-u=c$. But $M$ is a product of matrices $M_i$ which have the following property: for any vector $u$, we have $\mathop{\textrm{min}} u\leq\mathop{\textrm{min}} Mu\leq\mathop{\textrm{max}}Mu\leq\mathop{\textrm{max}}u$. So the same is true of $M$ itself, and in particular $\mathop{\textrm{max}}Mu\leq\mathop{\textrm{max}}u$, which is incompatible with $Mu-u=c$ since the maximum component of $u+c$ is bigger than that of $u$.

Hence there is a left-nullspace vector $w$ whose sum isn't 0, so from $wv=wl$ and the fact that all components of $l$ are equal it follows that their common value is rational.


OK, back to the second theorem promised above, the {2,k} case. This will be harder work.

(The following all feels very clumsy, and I suspect it is difficult to follow without pencil and paper or a symbolic-algebra system. I bet there are ways to streamline it a lot.)

We have a proposal for what the sequence of die-choices ought to be: the $k$-sided when $k|n$, otherwise the 2-sided. (So, e.g., when $k=5$ this goes 222252222522225 and so on.) Write $W(n)$ for the resulting sequence of winning probabilities. We will be done if we can show that for each $n$, applying the "wrong" die to the sequence $W$ at step $n$ makes things worse, or at least better. That is, when $k|n$ we would like it to be true that $\frac{W(n-2)+W(n-1)}2\leq\frac{W(n-k)+\cdots+W(n-1)}k$; otherwise we would like the inequality to be reversed. This immediately yields a proof by induction that in fact the sequences $V$ and $W$ are the same.

The thing that makes this a promising line of attack is that now we are operating entirely with the known sequence $W$ and "just" need to prove some inequalities for it. (In other cases that empirically seem to have a periodic sequence of optimal choices, the same approach should be applicable, though possibly more painful.)

The first step is to figure out what the values $W(n)$ actually are. Each is determined by the previous $k$ values (in general, this holds where $k$ is the largest number of faces on any of the dice). So let's look at what happens to the last $k$ values in the sequence when we compute the next one. If we represent those last $k$ values by a column vector $X$, what happens is that we shift them up one (discarding the first = oldest) and replace the last one with the average of some of its predecessors; that is, we premultiply $X$ by a matrix that has $k-1$ rows with 1s just above the main diagonal, and a final row that's $(\frac1k \frac1k \dots \frac1k)$ if we're using the $k$-sided die or $(0 \dots 0 \frac12 \frac12)$ if we're using the 2-sided one. Call these matrices $M_k$ and $M_2$.

So, we begin with the vector $(0 \dots 0 1)^\top$ which gives the $k$ values ending with the 0th. And then we premultiply, successively, by $M_2$ $k-1$ times, then $M_k$ once, then $M_2$ $k-1$ times, etc. So advancing a full $k$ places in the sequence means premultiplying by $M_kM_2^{k-1}$.

It's not difficult to compute the powers of $M_2$. For $r<k$, in fact, $M_2^r$ has $k-r$ upper rows with 1s $r$ places above the main diagonal, followed by $r$ lower rows $(0\,\dots\,0\,a_i\,b_i)$ for $1\leq i\leq r$, where $a_n=\frac13-(-1)^n\frac1{3\cdot2^n}$ and $b_n=1-a_n$. (Easy proof by induction.) For instance, here's $M_2^4$ where $k=7$ (so $r=4,k-r=3$: 3 rows of diagonal on top, 4 rows of 2 columns below):

$$\begin{pmatrix} 0&0&0&0&1&0&0 \\ 0&0&0&0&0&1&0 \\ 0&0&0&0&0&0&1 \\ 0&0&0&0&0&\frac12&\frac12 \\ 0&0&0&0&0&\frac14&\frac34 \\ 0&0&0&0&0&\frac38&\frac58 \\ 0&0&0&0&0&\frac5{16}&\frac{11}{16} \\ \end{pmatrix}$$

In particular, $M_2^{k-1}$ has a 1 at the top right corner, two columns below that, and everything else is 0s. And now we can compute $M:=M_kM_2^{k-1}$: all but the two rightmost columns are 0, and there we have $k-1$ rows with $a(1\dots k-1),b(1\dots k-1)$ followed by one row with $\frac{a_1+\cdots+a_{k-1}}k,\frac{1+b_1+\cdots+b_{k-1}}k$. Calling those $c_k,d_k$ we have $c_k=\frac13+\frac2{9k}\left((-2)^{-k}-1\right)$ and $d_k=\frac23-\frac2{9k}\left((-2)^{-k}-1\right)$.

We can now say fairly explicitly (we will be able to be even more explicit later) what the sequence $W$ is. Write $X(0)$ for the column vector with $k-1$ 0s followed by a 1, and $X(m+1)=M_kM_2^{k-1}X(m)$; then concatenating the elements of all the $X(m)$ gives precisely the sequence $W$ starting from element $-(k-1)$.

At this point, we have enough to prove the inequality we're interested in in the case $k|n$. Doing a 2-average instead of a $k$-average at step $n$ means that the last (= leftmost) factor of $M$ we multiply by is replaced with $M_2^k$, which looks just like $M$ except that its last row has $a_k,b_k$ instead of $c_k,d_k$. So if the last two elements of $X(m-1)$ were $p,q$ then we need to verify that $a_kp+b_kq\leq c_kp+d_kq$. We may readily check that $c_k-a_k=-(d_k-b_k)<0$, so our condition will hold provided $p\leq q$. Obviously this is true the very first time, where $p=0,q=1$. After that, we have $\binom{p}{q}=\binom{a_{k-1}\ b_{k-1}}{c_k\ d_k}\binom{p'}{q'}$ where $p',q'$ are the previous values of $p,q$ -- and once again we have $c_k-a_{k-1}=-(d_k-b_{k-1})<0$, so $p\leq q$ follows from $p'\leq q'$, and is therefore always true by induction.

We also need to prove that the opposite inequality holds when $k\nmid n$. The machinery developed above doesn't make this so straightforward, because the $k$-average involves elements from two successive $X(m)$ vectors. But, as we'll see, it's not so bad.

We can make our calculations of the $W(n)$ more explicit. Those $M_kM_2^{k-1}$ matrices all have 0s except in their two rightmost columns, which means that each $X(m+1)$ can be computed from just the last two elements of its predecessor $X(m)$ (which is obvious when we think about what things we're averaging to get the elements of $X(m+1)$). So let's write $Y(m)$ for the last two elements of $X(m)$; we have $Y(m+1)=NY(m)$ where $N$ is the bottom-right 2x2 submatrix of $M_kM_2^{k-1}$, which is to say $\left(\begin{smallmatrix}a_{k-1}&b_{k-1}\\c_k&d_k\end{smallmatrix}\right)$. We have a fairly simple explicit formula for each entry in that 2x2 matrix, and the further nice fact that the rows add up to 1, and this suffices to give us a (rather ugly) explicit formula for the powers of the matrix. I confess that I used a computer-algebra system to find it; once found it's easy to verify by induction. The $m$th power is

$$N^m=\frac1{\beta-\alpha}\left[\begin{pmatrix} \beta&-\alpha \\ \beta&-\alpha \end{pmatrix}+\begin{pmatrix} -\alpha&\alpha \\ -\beta&\beta \end{pmatrix}t^m\right]$$

where

$$\begin{eqnarray} \alpha&=&6sk \\ \beta&=&-3sk+2s-3k \\ s&=&(-2)^k-1 \\ t&=&2(-2)^{-k}\frac{s+3k}{9k}. \end{eqnarray} $$

(Of course the computer algebra system didn't give it to me in that nice convenient form!) It's easy to see that $|t|<1$, so the limit as $m\rightarrow\infty$ is

$$\frac1{\beta-\alpha}\begin{pmatrix} \beta&-\alpha \\ \beta&-\alpha \end{pmatrix}$$

and so the limiting value of the sequence $W$ (which as someone already pointed clearly has a limit) is just $\frac\alpha{\alpha-\beta}$, matching all the empirical values for the limit in the {2,k} case already calculated in comments above. The formula given near the start is equivalent to this.

And we can now be a bit more explicit about the values of $W$. We have $\binom{W(mk-1)}{W(mk)}=N^m\binom01$ ($N^m$ being the matrix we just computed), and then for $0\leq r<k$ we have $W(mk+r)=a_rW(mk-1)+b_rW(mk)$. (Note that $a_0=0,b_0=1$ so the case $r=0$ works.)

And, to simplify things just a little further, $N^m\binom01$ is

$$\frac1{\beta-\alpha}\begin{pmatrix} -\alpha+t^m\alpha \\ -\alpha+t^m\beta \end{pmatrix}$$

and so

$$\begin{eqnarray} W(mk+r)&=&[a_r(-\alpha+t^m\alpha)+b_r(-\alpha+t^m\beta)]/(\beta-\alpha) \\ &=&[-\alpha+(a_r\alpha+(1-a_r)\beta)t^m]/(\beta-\alpha) \\ &=&[-\alpha+(a_r(\alpha-\beta)+\beta)t^m]/(\beta-\alpha) \\ &=&[-\alpha+\beta t^m]/(\beta-\alpha)-a_rt^m \end{eqnarray}$$

where, again,

$$\begin{eqnarray} \alpha&=&6sk \\ \beta&=&-3sk+2s-3k \\ s&=&(-2)^k-1 \\ t&=&2(-2)^{-k}\frac{s+3k}{9k}. \end{eqnarray} $$

Now we have a reasonably simple formula for $W(mk+r)$, it's time to prove that other inequality. Suppose $0<r<k$ and let's compare the average of the $k$ terms preceding $W(mk+r)$ with $W(mk+r)$ itself. The difference, which we would like to be $\geq0$, is

$$\begin{eqnarray} &&\frac{-\alpha+\beta t^m}{\beta-\alpha}-a_rt^m -\frac1k\sum_{0\leq r'<k}\left(\frac{-\alpha+\beta t^{m-[r'\geq r]}}{\beta-\alpha}-a_{r'}t^{m-[r'\geq r]}\right)\\ &=&\frac{\beta t^m}{\beta-\alpha}-a_rt^m -\frac1k\sum_{0\leq r'<k}\left(\frac{\beta t^{m-[r'\geq r]}}{\beta-\alpha}-a_{r'}t^{m-[r'\geq r]}\right) \\ &=&\frac{t^m}k\left[\frac{\beta k}{\beta-\alpha}-k a_r -\sum_{0\leq r'<k}\left(\frac{\beta t^{-[r'\geq r]}}{\beta-\alpha}-a_{r'}t^{-[r'\geq r]}\right)\right] \\ &=&\frac{t^m}k\left[\frac{\beta k}{\beta-\alpha}-k a_r -\left(k\frac{\beta}{\beta-\alpha}+(k-r)\frac{\beta (t^{-1}-1)}{\beta-\alpha}\right)+\left(\sum_{0\leq r'<k}a_{r'}+\sum_{r\leq r'<k}{a_r'}(t^{-1}-1)\right)\right] \\ &=&\frac{t^m}k\left[-k a_r -(k-r)\frac{\beta (t^{-1}-1)}{\beta-\alpha}+\sum_{0\leq r'<k}a_{r'}+\sum_{r\leq r'<k}a_{r'}(t^{-1}-1)\right] \end{eqnarray}$$

where $[r'\geq r]$ is Iverson notation denoting 1 if the proposition inside is true and 0 if it's false. Now we may easily check $t^{-1}-1=(\alpha-\beta)/2(s+3k)$, so this equals $$\begin{eqnarray} &&\frac{t^m}k\left[-k a_r +(k-r)\frac{\beta}{2(s+3k)}+\sum_{0\leq r'<k}a_{r'}-\frac{\beta-\alpha}{2(s+3k)}\sum_{r\leq r'<k}a_{r'}\right]. \end{eqnarray}$$

Since we always have $t\geq0$, this is enough that we can now prove the result for any fixed choice of $k$ just by checking it in $k-1$ cases -- the expression inside the brackets is independent of $m$. But let's try to do it in generality.

Those sums are easily evaluated: the first is $\frac{k}3-\frac29(1-(-2)^{-k})$ and the second is $\frac{k-r}3-\frac29((-2)^{-r}-(-2)^{-k})$. We may readily verify that the first of these equals $-((-2)^{-k}\beta)/9$, and then (using this and the fact that $(-2)^{-r}=1-3a_r$) that the second equals $-\frac{r}3-\frac{(-2)^{-k}\beta}9+\frac23a_r$. So we want to prove that this is non-negative:

$$\begin{eqnarray} &&-k a_r +(k-r)\frac{\beta}{2(s+3k)}+\frac{(-2)^{-k}\beta}{9}+\frac{\beta-\alpha}{2(s+3k)}\left(\frac{2k-r}3-\frac{(-2)^{-k}\beta}9-\frac23a_r\right). \end{eqnarray}$$

With a bit of computer algebra (or a bit of patience) we find that this equals

$$\begin{eqnarray} &&\frac19\frac{-2s+3k}{s+3k}\left(1-(-2)^{-r}-3(1-(-2)^{-r})k+3r\right). \end{eqnarray}$$

The first factor is negative ($s$ always dominates). The second factor is also negative because $k>r$ and the terms with $(-2)^{-r}$ are small. Hence this is non-negative, so averaging the last $k$ terms rather than the last 2 is never better in cases where $W$ assumes we are averaging the last 2, hence by induction $V=W$ (since we already proved the corresponding inequality for the case where $W$ assumes we are averaging the last $k$). We're done!


I've kept 3/4 of my promise above, and dealt with the cases $\{k\}$ and $\{2,k\}$. What about the third theorem listed above, dealing with $\{k,k+1,\dots,2k-1\}$?

For the first $k$ steps we cannot possibly do better than using the $k$-sided die each time, because averaging further back just brings in zeros. So our sequence begins (after the initial infinity of zeros) $1, \frac1k, \frac{k+1}{k^2}, \frac{(k+1)^2}{k^3}, \dots, \frac{(k+1)^{k-1}}{k^k}$.

After this, note that obviously averaging $k$ numbers together with their average (i.e., what we get by choosing the $k+1$-sided die) gives that same average again; and that since those $k$ numbers are decreasing we do strictly worse by averaging fewer; and that since they are preceded by zeros we do strictly worse by averaging more. Hence we should average $k+1$ next, and the last number will repeat.

And after this a similar situation repeats itself and recommends that we average $k+2$, and then $k+3$, and so on until we reach $2k-1$. (Formal proof would be by induction. The details are easy.)

At this point we have $k$ consecutive instances of the exact same value, preceded by $k-1$ strictly smaller things. So we can either average $k$, giving the same average a $(k+1)$th time, or else get a worse result; we must do the former.

Again, a similar situation repeats itself with successively larger numbers of identical values, until eventually we have had the same value $2k-1$ times in a row. At this point we can choose any die we like and always get the same result.

So, our sequence of winning probabilities consists of $\frac{(k+1)^{j-1}}{k^j}$ for $j=1,2,\dots,k$, followed by infinitely many copies of that last figure.

(It is in fact easy to see that adding more dice with $\geq 2k$ faces to the set makes no difference either to the optimal choices or to the sequence of win probabilities.)


What are the prospects for extending any of this to other cases more like the one MJD asked about? Not good (at least if we want general theorems rather than answers in specific cases; the latter might be possible) until we have at least one new idea. The first and third theorems used special-case tricks that certainly won't generalize. The key things that make the analysis in the second theorem possible are:

  • We could guess in advance that the sequence will repeat, and what the repeating pattern will be.
    • This lets us compute what the "right" winning probabilities are, and then confirm our guess at the pattern by checking that choosing different dice never helps.
  • For that repeating pattern, we could get a not-too-horrible explicit expression for the winning probabilities that result, because the repeating pattern is nice and simple.
    • This means that the computation of the "right" winning probabilities yields something we can actually work with and prove things about.
  • In particular, it ends up coming down to computing powers of a 2x2 stochastic matrix.
    • This means that we don't have to do anything much more complicated than summing geometric progressions. We aren't e.g. computing eigenvalues and eigenvectors of large matrices.

So what happens in the {4,6,8} case that MJD originally asked about? Empirically it looks as if there is a repeating pattern: it has period 116 (with 73 choices of the d4, 19 of the d6, and 24 of the d8). The good news is that the periodicity seems to be immediate -- the first 116 choices repeat for ever. So in principle we do in this case have our ansatz for what the right answers ought to be, and we could compute the resulting win-probabilities in terms of powers of some 8x8 matrix and a ton of other linear algebra. But getting those into a form explicit enough to prove things about might be extremely painful, or even impossible without some further insights.

Solution 2:

Yet more experimental results (to be honest I hardly understand how it works, I just know enough math to spot the mistakes in the previous experiments), but this time I believe I've got the correct answer.
I hereby thank @ShreevatsaR (and indirectly @RobPratt) for providing the notebook (and code) that allowed me to figure this out. Would never have done it by myself.

The (experimental, but probably correct) answer is $$\lim_{n\to\infty} V(n)=\frac{53593778027393979383062834089689687375851958941331473223707343944117988244200}{117371871369111168311975072842802096714042245341293610168598354761946856284327}.$$(That's 77 digits over 78 digits. No wonder nobody found it before!)

What I did: I added a line to the notebook that tried to approximate the fraction with a denominator only up to 1/3 as long as the precision, to avoid overfitting.
(Overfitting would probably start happening at around $\frac{N}{2}$ digits, where $N$ is the precision - this being the point where there's enough possible fractions of this length that most $N$-digit decimals are covered at least once. I could probably have stopped at $\frac{N}{2+\varepsilon}$ digits, but I decided to use $\frac{N}{3}$ just in case.)

At a precision of 1005 digits this got me a fraction far shorter than the expected 300 or so digits, so I guessed this was the correct answer.
But just in case I tried again with 2005 digits (this took about 15 minutes), and consequently 600+ digits allowed for the denominator, and got the same answer.

Incidentally, I agree with @Michael Lugo's conjecture that those values are probably rational (if sometimes with ridiculous denominators) for any finite set of integer dice.

Solution 3:

Let $D$ be the set of dice, and let value function $V(n)$ be the maximum win probability in state $n$. Then $V$ satisfies Bellman's equations: $$V(n)=\begin{cases} 0 & \text{if $n<0$} \\ 1 & \text{if $n=0$} \\ \max\limits_{d\in D} \frac{1}{d} \sum_{r=1}^d V(n-r) & \text{if $n>0$} \\ \end{cases}$$

For $D=\{4,6,8\}$, solving Bellman's equations appears to yield limiting value $$\lim_{n\to\infty} V(n)=\frac{1311501709}{2872225388}.$$

EDIT: This conjecture was based on post-processing a floating-point calculation, apparently with insufficient precision.


Here is Python code to compute $V(n)$ with exact arithmetic:

import functools

from fractions import Fraction

D = [4,6,8]

@functools.lru_cache()
def V(n):
    if n < 0:
        return 0
    if n == 0:
        return 1
    best = 0
    for d in D:
        expectation = 0
        for r in range(1,d+1):
            expectation += Fraction(V(n-r),d)
        best = max(best, expectation)
    return best

Solution 4:

Here's some Python code that may be of interest for empirical exploration. There are various magic numbers in it that may well be ill-chosen.

You need to have the mpmath library installed.

import collections,itertools
import mpmath
def gend(dice):
    n = max(dice)
    t = (n-1)*[mpmath.mpf(0)] + [mpmath.mpf(1)]
    yield (1,0)
    while True:
        u = max((sum(t[-d:])/mpmath.mpf(d),-d) for
 d in dice)
        u = (u[0],-u[1])
        yield u
        del t[0]
        t.append(u[0])
def analyse(dice, maxbits=100000, maxiters=100000):
    miniters = 1000
    bits = 1000
    while bits <= maxbits:
        print(f"Trying {bits} bits, up to {maxiters} iterations.")
        mp.prec = bits
        prev,nits = 0,0
        dseq = []
        tol = mpmath.mpf(0.5)**(bits-10)
        nconv = 0
        for (prob,die) in itertools.islice(gend(dice), maxiters):
            if abs(prob-prev) <= tol and nits >= miniters:
                nconv += 1
                if nconv > max(dice): break
            else: nconv = 0
            dseq.append(die)
            prev = prob
            nits += 1
        if nconv <= max(dice):
            print("Not converged; giving up; try again with larger maxiters.")
            return
        maxcoeff=2**(bits//5)
        tol=mpmath.mpf(0.5)**int(bits*0.45)
        print(f"Converged; invoking findpoly with maxcoeff~=2^{mpmath.mag(maxcoeff)}, tol~=2^{mpmath.mag(tol)}.")
        expr = mpmath.findpoly(prob, maxcoeff=maxcoeff, tol=tol, maxsteps=maxbits)
        if not expr:
            if bits >= maxbits:
                print("No rational found; try again with larger maxbits?")
                return
            print(f"No rational found with {bits} bits; trying with more.")
            bits = min(round(1.5*bits), maxbits)
            continue
        num = abs(expr[1])
        den = abs(expr[0])
        print(f"Likely rational value found: {num}/{den}")
        print(f"Error is about {mpmath.nstr(abs(mpmath.mpf(num)/mpmath.mpf(den)-prob),10)}.")
        break
    j = len(dseq)-2
    while j>10*max(dice) and dseq[j] == dseq[-1]: j -= 1
    j = int(0.8*j)
    dseq = dseq[:j]
    startpoint = len(dseq) // 10
    maxperiod = len(dseq) // 10
    period = None
    print(f"Attempting period-finding with {len(dseq)} values, start=maxperiod={maxperiod}.")
    for tl in range(1,maxperiod+1):
        tail = dseq[-tl:]
        reps = (len(dseq)-startpoint) // tl
        longtail = dseq[-(reps*tl):]
        if longtail == reps*tail:
            print("Period", tl)
            period = tl
            break
    else: print("No period found with this many iterations.")
    if period is not None:
        for i in range(len(dseq)-2*period):
            p = dseq[i:i+period]
            r = (len(dseq)-(i+period)) // period
            if dseq[i+period:i+period*(1+r)] == r*p:
                print("Repeating period starts at", i)
                print("Initial transient:", dseq[:i])
                print("Repeating seq:", ' '.join(map(str,p)))
                print("Counts:", collections.Counter(p))
                break

Invoke this with e.g. analyse((4,6,8)).

On the machine I'm using right now, for instance, that call takes about 0.4 seconds and outputs the folllowing:

Trying 1000 bits, up to 100000 iterations.
Converged; invoking findpoly with maxcoeff~=2^201, tol~=2^-449.
No rational found with 1000 bits; trying with more.
Trying 1500 bits, up to 100000 iterations.
Converged; invoking findpoly with maxcoeff~=2^301, tol~=2^-674.
Likely rational value found: 53593778027393979383062834089689687375851958941331473223707343944117988244200/117371871369111168311975072842802096714042245341293610168598354761946856284327
Error is about 1.425530482e-452.
Attempting period-finding with 2032 values, start=maxperiod=203.
Period 116
Repeating period starts at 1
Initial transient: [0]
Repeating seq: 4 4 4 4 6 6 4 8 4 4 4 4 6 8 4 8 4 4 4 6 6 8 4 8 4 4 4 6 6 8 4 8 4 4 4 6 4 8 4 8 4 4 4 6 4 8 4 4 4 4 4 6 4 8 4 4 4 4 4 6 4 8 4 4 4 4 6 8 4 8 4 4 4 4 6 8 4 8 4 4 4 6 6 8 4 8 4 4 4 6 4 8 4 8 4 4 4 6 4 8 4 8 4 4 4 6 4 8 4 4 4 4 4 6 4 8
Counts: Counter({4: 73, 8: 24, 6: 19})

In more difficult cases it takes longer, of course. For instance, analyse((3,6)) takes about 42 seconds before finding the thousands-of-digits-long rational discussed elsewhere in comments; it ends up using about 38k bits of precision. (The repeating period is of length 743, it starts at place 3635, and it contains 555 three and 188 sixes.)


[Important confession: in a previous version of this code the convergence check was more simple-minded than the one above. Many of the calculations reported below used that previous version. I don't think any of the results are invalid for that reason, but if you are going to rely on what follows for something important (how??) you might want to double-check using the code as it now stands above.]

Some empirical findings from this (some of which have already been reported in other answers, comments thereon, or comments on the original question, but it seems worth putting them together):

We already know (with proof!) what {k} and {2,k} do; see my other answer.

{3,k}:

  • k=4: 512/891, period 3 [344], starting at place 6
  • k=5: 9344/17145, period 5 [53333], starting at place 5
  • k=6: thousands of digits, period 743, starting at place 3635
  • k=7: 2653/5002, period 7 [3333337], starting at place 1
  • k=8: 136/261, period 8 [33333338], starting at place 1
  • k=9: 361/697, period 9 [333333339], starting at place 1
  • k=10: 410/791, period 10 [333333333A], starting at place 1
  • the pattern in the periods (and the existence of "reasonable" rational limits) continues at least to k=20

So that's interesting: for $k>6$ everything is lovely (maybe there's a theorem to be proved...), for $k=6$ it's horrible, and for $k<6$ things are fairly nice but it takes a little while to settle down rather than being immediately periodic. I guess maybe when you have {m,n} with n>2m you're forced to take the m-sided die for the first while, and maybe that helps get things into a nice state.

For {4,k}:

  • k=5: 5041105/11010048, period 4 [4555] starting at 20
  • k=6: 2074375/4611072, period 6 [644444] starting at 6
  • k=7: 4779071465392056377836750715/11212131909551128478625038336, period 7 [7444444] starting at 49
  • k=8: 155 digits, period 199 starting at 33
  • k=9: 261077813956747/618161221140480, period 9 [944444444] starting at 18
  • k=10: 555831345/1323302912, period 10 [A444444444] starting at 10
  • k=11: 61 digits, period 98 starting at 1
  • k=12: 469 digits, period 71 starting at 627
  • k=13: 87922601/212701839, period 13 [444444444444D] starting at 1
  • k=14: 145186048563612283212299803/352208024470599386445905920, period 14 starting at 28
  • k=15: 83 digits, period 164 starting at 1

So much for things being nice when the ratio of dice is large. No very obvious patterns here.

Now for {5,k}:

  • k=6: some rather alarming things happen here (maybe there's a bug in my code); see below.
  • k=7: no rational found up to 100k bits (looking for numerators and denominators up to about 6000 decimal digits long)
  • k=8: no rational found up to 100k bits
  • k=9: no rational found up to 100k bits

That {5,6} case is scary. Depending on exactly what parameters I feed into my analyse function, it finds various different rational numbers of different sizes, with lots of factors of 10 in the denominator, all of them approximating the limits it finds well enough to convince mpmath that they're real. I suspect they may all be bogus. I would be extremely interested if someone else were to look into this case using different code.

(See below for more about this case.)

For other values of $k$, $\{5,k\}$ seems not to produce a rational limit of at-all-reasonable size. Perhaps these values genuinely aren't rational? Or perhaps it's just that some key thing grows rather rapidly with the number of faces on the smallest die.

[UPDATE added later: I've set my code running for quite a long time on {5,7}. On current evidence it seems there's no rational limit with fewer than about 76000 decimal digits. UPDATE later still: still none up to about 114000 digits. UPDATE even laterer: none up to about 171000 digits.]


Let's look at some larger sets of dice. We already know what $\{k,k+1,\dots,2k-1\}$ does, and that adding other larger dice to that set doesn't change anything. Other than avoiding these, there's no particular logic to which sets I've looked at below. There are a couple of patterns that seem to be present...

With {2,4,6} we get 86/123 and a repeating period of 6 [222426] starting at the beginning.

With {2,4,7} we get 623/884 and a repeating period of 7 [2224227] starting at the beginning.

With {2,4,8} we get 1400/2003 and a repeating period of 8 [22242228] starting at the beginning.

There seems to be a pattern here, in the repeating periods at least. But...

With {2,4,9} we get 709/1023 and a repeating period of 9 [222422249] starting at the beginning.

With {2,4,10} we get 14090/20813 and a repeating period of 10 [222422242A] starting at the beginning.

With {2,4,11} we get 31097/44552 and a repeating period of 11 [2224222422B] starting at the beginning.

With {2,4,12} we get 22580/32399 and a repeating period of 12 [22242224222C] starting at the beginning.

So the pattern is at least a little more complicated than it first appears.

With {3,4,6} we get 52544/90801 (quite a lot of 3s in the denominator) and a repeating period of 6 [644344] starting at index 6.

With {3,4,7} we get 196/339 and a repeating period of 7 [3343374] starting at index 2.

With {3,4,8} we get 40/69 and a repeating period of 8 [33343348] starting at the beginning.

With {3,6,9} we get 2235/4219 and a repeating period of 9 [333336339] starting at the beginning. The period here (333, 336, 339) is suggestive, and this sort of structure has appeared in other repeating periods -- e.g., 22242228 for {2,4,8} -- but this may just be the Law of Small Numbers at work.

With {4,5,6} we get 5675/11776 (that's a lot of 2s in the denominator!) and a repeating period of length 4 [4566] starting at index 8.

With {4,5,7} we get 164275/349696 (again lots of 2s in the denominator) and a repeating period of length 7 [7455455] starting at index 7.

{4,6,8} -- the original combination of dice in the question -- is described above; we get a period of length 116.

{2,4,6,8,10,12} has limit 46124/65305 and period 22 24 26 28 2A 2C, which is kinda nice and, along with what we saw above for {2,4,6}, suggests an obvious pattern in the periods -- which does indeed also hold for {2,4,6,8} and {2,4,6,8,10}, as well as {2,4}. It wouldn't surprise me if this one were provable without too much pain.

Here's an interestingly suggestive set -- I report only the periodic patterns and not the limits; all periodic patterns start at the beginning except for {4,6,10} where the "x"s indicate where the repeating pattern actually begins.

{4,6,8}: horrible (see above)
{4,6,10}: x x 4 4 6 6 4 4 4 10 4 6
{4,6,12}: 4 4 4 4 6 6 4 4 4 4 4 12
{4,6,14}: 4 4 4 4 6 6 4 4 4 4 4 6 4 14
{4,6,16}: 4 4 4 4 6 6 4 4 4 4 4 6 4 4 4 16
{4,6,18}: 4 4 4 4 6 6 4 4 4 4 4 6 4 4 4 4 4 18

It sure looks like there's a regular pattern there starting at 14 and arguably at 12; even {4,6,10} kinda fits. But {4,6,8} is this horrible thing of length 116!


I've made some improvements to the code, including a function analyses that processes multiple sets of dice and then reports concisely on them all. Here's how it currently stands:

def analyse(dice, maxbits=100000, maxiters=100000, minbits=1000, miniters=1000):
    bits = minbits
    while bits <= maxbits:
        print(f"Trying {bits} bits, up to {maxiters} iterations.")
        mp.prec = bits
        prev,nits = 0,0
        dseq = []
        tol = mpmath.mpf(0.5)**(bits-10)
        nconv = 0
        for (prob,die) in itertools.islice(gend(dice), maxiters):
            if abs(prob-prev) <= tol and nits >= miniters:
                nconv += 1
                if nconv > 2*max(dice): break
            else: nconv = 0
            dseq.append(die)
            prev = prob
            nits += 1
        if nconv <= 2*max(dice):
            print("Not converged; giving up; try again with larger maxiters.")
            return None
        maxcoeff=2**(bits//6)
        tol=mpmath.mpf(0.5)**int(bits*0.85)
        print(f"Converged after {nits} iters; invoking findpoly with maxcoeff~=2^{mpmath.mag(maxcoeff)}, tol~=2^{mpmath.mag(tol)}.")
        expr = mpmath.findpoly(prob, maxcoeff=maxcoeff, tol=tol, maxsteps=maxbits)
        if not expr:
            if bits >= maxbits:
                print("No rational found; try again with larger maxbits?")
                return None
            print(f"No rational found with {bits} bits; trying with more.")
            bits = min(round(1.5*bits), maxbits)
            continue
        num = abs(expr[1])
        den = abs(expr[0])
        err = abs(mpmath.mpf(num)/mpmath.mpf(den)-prob)
        print(f"Likely rational value found: {num}/{den}")
        print(f"Error is about {mpmath.nstr(err,10)} (~{-mpmath.mag(err)} bits).")
        bits = round(1.5*bits)
        print(f"Double-checking with 50% more bits ({bits}).")
        mp.prec = bits
        miniters = nits
        prev,nits = 0,0
        tol = mpmath.mpf(0.5)**(bits-10)
        nconv = 0
        for (prob,die) in itertools.islice(gend(dice), round(1.25*maxiters)):
            if abs(prob-prev) <= tol and nits >= miniters:
                nconv += 1
                if nconv > 2*max(dice): break
            else: nconv = 0
            prev = prob
            nits += 1
        if nconv <= 2*max(dice):
            if bits >= maxbits:
                print("Failed to converge; giving up; try again with larger maxbits?")
                return None
            print("Failed to converge this time; continuing the search with more bits.")
            bits = min(round(1.2*bits), maxbits)
            continue
        err = abs(mpmath.mpf(num)/mpmath.mpf(den)-prob)
        print(f"Error is about {mpmath.nstr(err)}, 2^-bits~={mpmath.nstr(mpmath.mpf(0.5)**bits)}, num~={mpmath.nstr(mpmath.mpf(num))}.")
        if err >= mpmath.mpf(0.5)**int(bits*0.85):
            if bits >= maxbits:
                print("That's suspiciously large; giving up; try again with larger maxbits?")
                return None
            print("That's suspiciously large; continuing the search with more bits.")
            bits = round(1.2*bits)
            continue
        else:
            print("Seems OK.")
            break
    j = len(dseq)-2
    while j>20*max(dice) and dseq[j] == dseq[-1]: j -= 1
    j = int(0.8*j)
    dseq = dseq[:j]
    startpoint = len(dseq) // 10
    maxperiod = len(dseq) // 10
    period = None
    print(f"Attempting period-finding with {len(dseq)} values, start=maxperiod={maxperiod}.")
    for tl in range(1,maxperiod+1):
        tail = dseq[-tl:]
        reps = (len(dseq)-startpoint) // tl
        longtail = dseq[-(reps*tl):]
        if longtail == reps*tail:
            print("Period", tl)
            period = tl
            break
    else:
        print("No period found with this many iterations.")
        return (num,den,None,None,[],[])
    if period is not None:
        for i in range(len(dseq)-2*period):
            p = dseq[i:i+period]
            r = (len(dseq)-(i+period)) // period
            if dseq[i+period:i+period*(1+r)] == r*p:
                print("Repeating period starts at", i)
                print("Initial transient:", (dseq[:i] if i<100 else "long"))
                print("Repeating seq:", (' '.join(map(str,p)) if period<100 else "long"))
                print("Counts:", collections.Counter(p))
                return (num,den,period,i,dseq[:i],p)
                break
        return (num,den,period,None,[],tail)
def analyses(dices, **kwargs):
    results = []
    for dice in dices:
        print("Looking at", dice)
        results.append((dice, analyse(dice, **kwargs)))
    print("--------")
    for (dice, result) in results:
        if not result: print(f"{dice}: no rational found.")
        else:
            num, den, period, start, transient, p = result
            nlen = len(str(num))
            if nlen <= 80: rat = str(num) + "/" + str(den)
            else: rat = f"({len(str(num))} digits)"
            def h(n): return hex(n)[2:].upper()
            if p and max(dice)<16 and period<30: pstr = f"length {period} [{''.join(map(h,p))}]"
            elif p and period<15: pstr = f"length {period} [{' '.join(map(str,p))}]"
            else: pstr = f"length {period}"
            if not period: print(f"{dice}: {rat}, no periodicity found.")
            elif not start: print(f"{dice}: {rat}, {pstr}.")
            else: print(f"{dice}: {rat}, {pstr} starting {start}.")

(The gend function is unaltered.)

Warning: the analyse function now returns some information about what it found, and the printed representation of that can be long; you may want to be careful when using it interactively.

The approximations accepted by this code generally have only a few more bits error than the upper bound imposed by the working precision. Does this mean that it catches all bogus rationals? Alas, I am not sure it does. In the (5,6) case it gives us a rational number with 2135 digits in the numerator, and die-choices with a repeating period of length 123. This was found with 69195 bits of precision, and confirmed with 103792 bits, with an error of about 5x10^-31244; 2^-103792 is about 3x10^-31245. I reran with still higher precision, and got the same rational number with an error of about 5x10^-67731. Sounds good, no?

Well, remember that this case produced an earlier dubious rational approximation. I don't remember exactly what it was, but a dubious rational approximation (found by looking at the continued fraction for the 2135-digit rational, which has one coefficient that's 625 digits long) whose numerator with 84 digits; this approximation has error about 8x10^-793. That also sounds pretty good, no? Also suspicious: the 2135-digit approximation has a ton of factors of 10 in its denominator, and I'm pretty sure that an earlier dubious approximation I found also does, and that seems a bit fishy. (The continued-fraction-based one doesn't have very many, though.)

[UPDATE a bit later: I ran it again with more bits. Same result, good to about 10^-180000. And then more: good to about 10^-360000. And then even more: good to about 10^-900000. In both cases, it's out by only a modest number of units in the last place. Periodicity in the die-choices is consistent for at least about 3 million choices. I still think it's possible that all this is bogus, but if so then the precision needed to demonstrate that empirically may be excessive.]

All the other rationals and periods reported above have been checked with this version of the code, and I didn't spot any discrepancies.