How many rolls do I need to determine if my dice are fair?

Solution 1:

A chi-square test is the first thing that comes to mind: $$ \sum\frac{(\text{observed} - \text{expected})^2}{\text{expected}} $$ If you roll the die $n$ times, the "expected" number of times you would see any particular outcome is $n/6$. If $n$ is large, this has approximately a chi-square distribution with 5 degrees of freedom. You reject the null hypothesis of fairness if the test statistic given above is large.

95% confidence does mean one out of twenty fair dice will fail.

See also this amazing analysis by a physicist of perhaps the most extensive experiment of this kind ever done: http://bayes.wustl.edu/etj/articles/entropy.concentration.pdf

A further refinement of the chi square test would be to note that each outcome of a roll has an opposite face. If one outcome is unusually high, the opposite face should be unusually low. Thus, it is the difference between opposite face frequencies that detect unbalance in the die. You could create a simulation in a simple spreadsheet and find the confidence limits by Monte Carlo.

Solution 2:

You say this is for a test of paranormal abilities. So you have to ask your psychic what they think they can achieve. They might say one of the following:

I can throw a six more often than chance.
I can throw 1, 2, or 3 more often than chance.
I can throw a larger than expected total.

Whatever they say, get it in writing. This is a psychic you're dealing with.

Now you have to decide on your confidence level (I think 99% is reasonable here), and let your psychic choose the length of the test. Otherwise they might claim that they got tired (if there were a lot of tests), or that they didn't get into their stride (if there weren't).

Let's assume they claim to be able to throw sixes. If the die is fair, then the number of sixes in $n$ throws follows a binomial distribution, with mean $\mu = n/6$ and variance $\sigma^2 = 5n/36$. For large enough $n$ (which should certainly be the case here), the binomial distribution approximates the normal distribution, which for a one-tailed test at the 99% confidence level gives a cutoff of about $\mu + 2.326$ $\sigma$, or $n/6 + 0.867 \sqrt n$.

So now you can offer (say) the following choices:

$n = 100: 100/6 + 0.867\times10 = 25$ sixes
$n = 400: 400/6 + 0.867\times20 = 84$ sixes
$n = 900: 900/6 + 0.867\times30 = 176$ sixes

Whatever the psychic decides, get it in writing. This is a psychic you're dealing with.

The psychic will (with probability $99\%$) fail the test, and will (with probability $100\%$) come up with something like "Yeah, but look at all those fours!" or "I never could figure out why Wednesdays don't work for me -- how about we do it again tomorrow?"

Let us know how it goes.

Solution 3:

The chi-squared test mentioned above is the correct approach. However, in order to estimate the appropriate # of dice rolls for testing fairness you need to decide:

1) the desired power of your test (= probability of correctly detecting a biased die), and

2) the effect size you wish to detect with confidence.

A standard value for power is 80%. Typical effect sizes for a chi-squared test range from 0.1 (small) to 0.5 (large).

See this website for a quick tutorial on determining sample sizes for a chi-squared test using the free R language.

Choosing significance level = .05, power = 0.80, degrees of freedom = # sides - 1, and effect size = 0.5, I find the following sample sizes to test the fairness of 6-sided, 10-sided, and 20-sided dice:

6-sided: 52 rolls

10-sided: 63 rolls

20-sided: 83 rolls