Find the probability of a face of a loaded die.
I have a die, for which all faces have the same probability, except for one face. I know in advance which face is different, let's say it's the 6. My goal is to find the probability of that face by tossing the die $n$ times and counting how many times it lands on the loaded face.
Let's suppose I rolled the die $n = 1000$ times and it landed $k = 250$ times on the 6. The obvious answer would be that the probability of landing on the 6 is $\frac{k}{n} = \frac{1}{4}$. But the real probability ($\frac{k}{n}$ after infinitely many tosses) could be different. It could also be only $\frac{1}{8}$ but I got extremely lucky in this case and the die landed 250 out of 1000 times on the 6. However, this would be extremely unlikely, but how unlikely? How to calculate the probability, that x% is the real probability of the die? I'd like to have a function f(x) = probability that x is the real probability of the die, with the parameters $n$ and $k$. I think the maximum of this function should be at $\frac{k}{n}$, but I'm not sure exactly how the function would look.
Also, I'd like to give upper and lower bounds on the probability after $n$ tosses. I know that it's technically impossible since the die could land $k$ times on the 6 no matter what the real probability is (except if it's 0% or 100%). However, I could set the bounds in a way that they ignore extremely improbable events. So, I want to say that the real probability of the die is between a% and b%, with a probability of c%. I believe this can be solved using integrals, but how?
Solution 1:
The number $X$ of 6's in your experiment with $n=1000$ rolls would be be a binomial random variable with $X \sim \mathsf{Binom}(100, p),$ where $p$ is the probability of the loaded face 6.
Using a frequentist approach, you could use this information to make a 95% Wald confidence interval for $p$ of the form $\hat p \pm 1.96\sqrt{\frac{\hat p(1-\hat p)}{1000}},$ where $\hat p = X/n = 250/1000 = 0.25,$ as you say. Using R, this computes to the interval $(0.223, 0.277).$ The Wald interval does not give good results for small $n,$ but for $n = 1000,$ it gives a good interval estimate of the probability $p$ of the loaded face.
n = 1000; x = 250
p.hat = x/n
CI = p.hat + qnorm(c(.025,.975))*sqrt(p.hat*(1-p.hat)/n)
CI
[1] 0.2231621 0.2768379
Taking the Bayesian approach suggested in the Comment of @user3257842, you would have the prior distribution $\mathsf{Unif}(0,1) \equiv \mathsf{Beta}(1,1).$ Your binomial likelihood function would be proportional to $p^{250}(1-p)^{750}.$
Then by Bayes' Theorem the posterior distribution of $p$ would have a density function proportional to the product of the above beta prior distribution and binomial likelihood function. Fortunately, these two functions are mathematically compatible (the technical term is 'conjugate') so it is easy to see that the posterior density function is proportional to $p^{1+250-1}(1-p)^{1+750 - 1},$ which matches the function form of the $\mathsf{Beta}(251,751)$ density function. Then a 95% posterior interval estimate (also called a 95% 'Bayesian credible interval') is found to be $(0.224, 0.278)$ as shown below in R.
qbeta(c(.025,.975), 251, 751)
[1] 0.2241639 0.2777775
Unlike the frequentist confidence interval, the Bayesian credible interval can be interpreted to apply directly to your biased die ---provided that you believe the prior distribution is reasonable. Fortunately, with $n=1000$ rolls of the die you have so much data that the influence of the uniform prior on the final result is small.
Notes: (1) If you are not familiar with beta distributions, see if you can find them in a textbook. Alternatively, see the Wikipedia page on beta distributions.
(2) There are many styles of frequentist confidence intervals for the binomial success probability. See the Wikipedia page on binomial confidence intervals.
(2) Among the frequentist intervals in (1) is the Jeffreys interval, which is derived from a Bayesian argument starting with the Jeffreys prior distribution $\mathsf{Beta}(.5,.5).$ The 95% Jeffreys interval for your data (whether interpreted as a frequentist confidence interval or as a Bayesian credible interval) is $(0.224, 0.278)$ (rounded to three places). For technical reasons, many Bayesians prefer to use $\mathsf{Beta}(.5,.5)$ instead of $\mathsf{Beta}(1,1)$ as a 'noninformative' prior distribution (in the absence of previous data or strong personal opinion).
qbeta(c(.025,.975), 250.5, 750.5)
[1] 0.2239111 0.2775336