Rolling $n$ $k$-sided dice and discarding the lowest $m$ of them.

In this question I will use the notation $\Bbb{E}(n,k,m)$ to refer to the expected average of rolling $n$ $k$-sided dice and discarding the lowest $m$ of them.


The most trivial response happens when $m = 0$, in which case we discard no dice and we arrive at the result:

$$\Bbb{E}(n,k,0) = \frac{k}{2} + \frac{1}{2}$$


When $m = 1$, I considered a sample case to give some intuition for this problem. Looking at the case of $\Bbb{E}(2, 6, 1)$, I found the following pattern.

enter image description here

There are one $1$s, three $2$s, five $3$s, etc.

The sum of these outcomes is:

$$\sum_{i=1}^{6} i(2i-1)$$

In general, the expected outcome for $\Bbb{E}(2,k,1)$ is:

$$\frac{\sum_{i=1}^{k} i(2i-1)}{k^2} = \frac{\sum_{i=1}^{k} 2i^2 - \sum_{i=1}^{k} i}{k^2} = \frac{\frac{2k(k+1)(2k+1)}{6} - \frac{k(k+1)}{2}}{k^2} = \frac{2}{3}k + \frac{1}{2} - \frac{1}{6k}$$


Now I want to consider $\Bbb{E}(3,k,2)$. In the previous case each value $i$ occurred $2i - 1$ number of times. Where did $2i - 1$ come from?

It looks like the frequency of occurrence is the difference of consecutive squares.

$$i^2 - (i-1)^2 = 2i - 1$$

This seems intuitive based on the image I provided. We can reason that in the case of $n = 3$, the frequency of occurrence will be the difference of consecutive cubes.

$$i^3 - (i-1)^3 = 3i^2 - 3i + 1$$

The expected outcome of $\Bbb{E}(3,k,2)$ is messy, so I'll just write the initial expression and the simplified expression.

$$\frac{\sum_{i=1}^{k} i(3i^2-3i+1)}{k^3} = \frac{3}{4}k + \frac{1}{2} - \frac{1}{4k}$$


Let's finally look at the case of $\Bbb{E}(n,k,n-1)$. Based on the previous results, I conjecture that it looks like:

$$\frac{\sum_{i=1}^{k} i(i^n - (i-1)^n))}{k^n} = \frac{n}{n+1}k + \frac{1}{2} - \mathcal{O}(\frac{1}{k})$$

My two questions are this:

  • Is my conjecture for $\Bbb{E}(n,k,n-1)$ correct, and if so, how can I prove this?
  • What happens when $m \ne n-1$? How can I adjust my analysis to account for discarding dice such that I leave not just the maximum value?

Solution 1:

If $M$ is the maximum value on the $n$ dice thrown, then $\mathbb{P}(M\leq x)=(x/k)^n$ so that $\mathbb{E}(M)=\sum_{x=0}^{k-1} \left(1-\left({x\over k}\right)^n\right)=k-{1\over k^n}\sum_{x=0}^{k-1}x^n.$ Using Faulhaber's formula you can show that $$\sum_{x=0}^{k-1}x^n ={1\over n+1}\left[k^{n+1}-{1\over 2}(n+1)k^n+O(k^{n-1})\right]$$ and putting these together gives, as you expected, $$\mathbb{E}(M)={n\over n+1}\,k+{1\over 2}+O(1/k).$$


Here is a more general argument. For any $1\leq j\leq n$ let $X_{(j)}$ be the $j$th order statistic of the $n$ dice rolls. For $0\leq x<k$, let $Y$ be the number of dice rolls that take values in $\{1,2,\dots, x\}$. We have $X_{(j)}>x$ if and only if $Y<j$ so that $$ \mathbb{P}(X_{(j)}>x)=\mathbb{P}(Y<j)=\sum_{y=0}^{j-1}{n\choose y} \left({x\over k}\right)^y \left(1-{x\over k}\right)^{n-y}.$$

Therefore we have the explicit formula $$\mathbb{E}(X_{(j)})=\sum_{x=0}^{k-1}\sum_{y=0}^{j-1}{n\choose y} \left({x\over k}\right)^y \left(1-{x\over k}\right)^{n-y}.\tag1$$

For large $k$ asymptotics, exchange the order of summation and rewrite as a Riemann sum to get $$\mathbb{E}(X_{(j)})=k\sum_{y=0}^{j-1}\sum_{x=0}^{k-1}{n\choose y} \left({x\over k}\right)^y \left(1-{x\over k}\right)^{n-y}{1\over k}\approx k\sum_{y=0}^{j-1} \int_0^1 {n\choose y} w^y(1-w)^{n-y}\,dw={k\,j\over n+1}.$$

Using the Euler-Maclaurin formula, you can get more terms in the asymptotic expansion, for example, $$\mathbb{E}(X_{(j)})={k\,j\over n+1}+{1\over 2}+O(1/k).$$


Asymptotics are nice, but the formula (1) also allows us to derive the bounds $${kj\over n+1}\leq \mathbb{E}(X_{(j)})\leq {kj\over n+1}+1.$$