How do you check if a die is fair using the posterior probability density function?

Setup:

  • We have a die with $S$ sides.
  • The probability of each side showing after rolling the die is $p_s$ for $s=1,2,\ldots,S$ with $\sum_{s=1}^S p_s = 1$.
  • We rolled the die and observed each side $n_s$ times for a total of $\sum_{s=1}^S n_s$ die rolls.

We want to estimate the probability that the die is fair. To do this, we need to define what "fair" means. Practically, it doesn't work to require $p_s=1/S$ for all $s$ since the probability of each face having a probability of precisely $1/S$ of showing up (so, with infinite precision) is 0.

To get around this, we pick an acceptable precision, $\epsilon$, and call the die "fair" if $|p_s - 1/S| < \epsilon$ for all $s$. We compute the probability of the die being fair as $$P(|p_s - 1/S| < \epsilon \ \ \forall s).$$

To compute this probability, we still need the posterior distribution over the probability vector $\vec{p} = (p_1, p_2, \ldots, p_S)$ given the observed number of times each face showed, $\vec{n} = (n_1, n_2, \ldots, n_S)$.

It seems reasonable to assume a uniform prior over $\vec{p}$. $$\vec{p} \sim \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha} = 1) = 1 \quad\text{with}\quad\sum_{s=1}^S p_s=1.$$ However, if we have background information about the die, we can encode that in the prior. For example, we could increase all the components of $\vec{\alpha}$ equally if we're fairly certain the die is fair.

It is convenient to write the uniform prior as a Dirichlet distribution since the posterior will also be Dirichlet distributed when the likelihood function is a Multinomial. The Dirichlet distribution is a conjugate prior for the Multinomial likelihood.

The likelihood function is $$P(\vec{n}\;|\;\vec{p}) = \text{Multinomial}(\vec{n}\;|\;\vec{p}) = \frac{n!}{n_1!n_2!\ldots n_S!}p_1^{n_1}p_2^{n_2}\ldots p_S^{n_S}.$$

The posterior distribution is $$P(\vec{p}\;|\;\vec{n}) = \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha}=\vec{1} + \vec{n}).$$

It is with respect to this posterior distribution that we want to compute $$P(|p_s - 1/S| < \epsilon \ \ \forall s)$$ to find whether the die is fair.

2-sided die

A 2-sided die is just a coin. In this case, we can compute the probability of being fair analytically.

Given observations $(n_1, n_2)$ and $\epsilon < \frac{1}{2}$ (we constrain the maximum threshold for considering a die fair to $\frac{1}{S}$ in general),

$$ \begin{align} & P(|p_s - 1/2| < \epsilon \ \ \forall s\in\{1,2\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,\delta_{1 - p_1 - p_2} \;\text{d}p_2\;\text{d}p_1 \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1} (1-p_1)^{n_2} \;\text{d}p_1 \\ =\;& I_{\frac{1}{2}+\epsilon}(n_1+1, n_2+1) - I_{\frac{1}{2}-\epsilon}(n_1+1, n_2+1) \end{align} $$

Here,

  • $\text{B}(a,b)$ is the Beta function.
  • $\delta_{1 - p_1 - p_2}$ is the Kronecker delta function which is 1 when $1 - p_1 - p_2$ and 0 otherwise. This is used to ensure that the probabilities of the faces of the die sum to 1 in the integral.
  • $I_x(a,b)$ is the regularized incomplete beta function.

3-sided die

For a 3-sided die, we can still compute the probability of being fair analytically but things get more complicated than the 2-sided case. Computing the probability using a numeric integral is still straightforward though.

Given observations $(n_1, n_2, n_3)$ and $\epsilon < \frac{1}{3}$,

$$ \begin{align} & P(|p_s - 1/3| < \epsilon \ \ \forall s\in\{1,2,3\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,p_3^{n_3}\,\delta_{1 - p_1 - p_2 - p_3} \;\text{d}p_3\;\text{d}p_2\;\text{d}p_1 \\ \end{align} $$

where $\text{B}()$ with 3 arguments is the multivariable generalization of the Beta function.

Here, we have to be careful when substituting $p_3 = 1 - p_1 - p_2$ and removing the inner integral since $\frac{1}{3} - \epsilon \le p_3 \le \frac{1}{3} + \epsilon$ places further constraints on the allowable values of $p_2$, affecting the limits of integration on $p_2$.

$$ \begin{align} =\;& \frac{1}{B(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1} \int_{\max(\frac{1}{3}-\epsilon, \frac{2}{3}-\epsilon-p_1)}^{\min(\frac{1}{3}+\epsilon, \frac{2}{3}+\epsilon-p_1)} p_2^{n_2} (1-p_1-p_2)^{n_3} \;\text{d}p_2 \;\text{d}p_1 \\ \end{align} $$

To solve this integral, we split the integral into 2 parts to get rid of the max and min functions in the limits of integration. After applying some properties of the incomplete beta function, the integral simplifies to

$$ \begin{align} =\;& \sum_{j=n_2+1}^{n_2+n_3+1}\Bigg( \\ &\qquad \frac{1}{j\,\text{B}(j, n_1+n_2+n_3+3-j)} \Bigg(\\ &\qquad\qquad + (1/3+\epsilon)^j (2/3 - \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1 - 3\epsilon}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \\ &\qquad\qquad - (1/3-\epsilon)^j (2/3 + \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \Bigg) \ +\\ &\qquad \frac{1}{(n_1+1+j)\,\text{B}(n_1+1+j, n_2+n_3+2-j)} \Bigg(\\ &\qquad\qquad - (1/3+\epsilon)^{n_2+n_3+1-j} (2/3-\epsilon)^{n_1+j+1} \left( I_{\frac{1}{2-3\epsilon}}(n_1+1, j+1) - I_{\frac{1-3\epsilon}{2-3\epsilon}}(n_1+1, j+1) \right) \\ &\qquad\qquad + (1/3-\epsilon)^{n_2+n_3+1-j} (2/3 + \epsilon)^{n_1+j+1} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, j+1) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, j+1) \right) \Bigg) \\ \end{align} $$

It might be possible to simplify this further but this is already an analytical solution with no integrals remaining.

Estimating the probability by sampling

For the general case, a die with $S$ sides, I don't have an analytical solution and computing the numeric integral becomes intractable due to the curse of dimensionality.

However, we can still estimate the probability through Monte Carlo integration by generating samples from the Dirichlet posterior and counting which fraction of them fall within the range $1/S\pm\epsilon$.

In the code below, we estimate the probability of the 3-sided die being fair by generating 1,000,000 dice from the posterior distribution over dice and counting how many of them are fair. It is straightforward to generalize this code to a die with more sides. We use $\epsilon = 1/30$ in this example.

import numpy as np
import scipy.stats as sts
 
n = np.array([2, 4, 3])
S = len(n)
epsilon = 0.1 / S
posterior = sts.dirichlet(n + np.ones(S))

samples = posterior.rvs(size=1000000)
samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
print('P(fair) ≈', np.mean(samples_within_epsilon))

This prints out P(fair) ≈ 0.022945. Because so few counts are observed in the data, many unfair dice are consistent with the observations and the probability of being fair is low.

With more data, we would become more certain. Setting n = np.array([332, 334, 333]) yields the output P(fair) ≈ 0.934626. Note that even with 999 rolls of the die, there is still a substantial probability (about 7.5%) that the die is unfair.

To investigate how the posterior probability of being fair scales with the number of die rolls in the data, let's assume all faces were observed with the same frequency and let the total number of die rolls range from 3 to 3000.

all_N = np.arange(0, 3001, 30)[1:]
p_fair = []
for N in all_N:
    n = np.array([N // 3] * 3)
    S = len(n)
    epsilon = 0.1 / S
    posterior = sts.dirichlet(n + np.ones(S))

    samples = posterior.rvs(size=1000000)
    samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
    p_fair.append(np.mean(samples_within_epsilon))

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
plt.title('How P(fair) varies with the total number of die rolls observed')
plt.xlabel('Observed die rolls')
plt.ylabel('P(fair)')
plt.plot(total_n, p_fair)
plt.ylim(0, 1)
plt.show()

How P(fair) varies with the total number of die rolls observed

We can also investigate how the posterior probability of being fair scales with $\epsilon$, keeping the number of die rolls observed fixed at 999.

n = np.array([999 // 3] * 3)
S = len(n)
posterior = sts.dirichlet(n + np.ones(S))
samples = posterior.rvs(size=1000000)

all_epsilon = np.linspace(0.01 / S, 0.1 / S, 100)
p_fair = []
for epsilon in all_epsilon:
    samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
    p_fair.append(np.mean(samples_within_epsilon))

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
plt.title(r'How P(fair) varies with $\epsilon$')
plt.xlabel(r'$\epsilon$')
plt.ylabel('P(fair)')
plt.plot(all_epsilon, p_fair)
plt.ylim(0, 1)
plt.show()

How P(fair) varies with epsilon

Comparing the plots above to the analytic result shows that they match perfectly.

Comparison while varying N

Comparison while varying epsilon