Calculating Probability distribution of urn with UNKNOWN numbers of balls

I was sure to find this in Bayes theorem and dimly remember it, but all the online sources I found assume too much information.

In my problem, I have an urn and all I know is that there are black and white balls inside. I do not know how many of each color, nor do I know how many in total. Balls are put back after drawing, so the distribution always stays the same.

Without any additional information, my a priori assumption should be a 50:50 distribution, I assume.

Now I draw balls from the urn. With each ball I draw, I can update my assumption. If I draw 5 white and 1 black, obviously my 50:50 assumption would have to be corrected more towards white.

I cannot for the life of me find the actual formula to update my assumption after each draw.

I believe I could model this with a Beta distribution, starting with BETA(1,1) and then updating to BETA(1,2) or BETA(2,1) depending on what I draw, and so on.

I would, however, prefer to simply calculate the probability, i.e. from 50:50 go to - well, what exactly? 75:25? That is the result/formula I'm looking for. What's my new assumption after 1 draw? Then after the 2nd, iteratively.


I believe you've misunderstood how to select an appropriate prior.

In a Bayesian analysis, the idea is that the sampling distribution models the likelihood of observing a sequence of outcomes, but the parameters that influence this likelihood are themselves random variables that follow some sort of distribution. In your case, the sampling of balls from the urn is a Bernoulli process, and the number of black balls observed in $n$ independent and identically distributed draws is a binomial random variable in which the parameter $p$ represents the probability of any single draw resulting in a black ball. In the Bayesian framework, this $p$ is itself a random variable with a probability distribution on the support $p \in [0,1]$. Selecting a prior does not amount to selecting a specific numeric value for $p$, but rather, selecting a distribution for $p$. This distribution is what we call the prior for $p$. Then, as data is observed about the balls in the urn, the probability distribution for $p$ may change--we may become more certain about what $p$ is, and what range of values are most likely to be true for $p$. This updated distribution is called the posterior for $p$.

In the case of a binomial likelihood, it is the case that when the prior distribution of $p$ has a beta distribution, the posterior will also have a beta distribution--this is called conjugacy. Thus, we say that the beta distribution is a conjugate prior for a binomial likelihood. Specifically, if $$X \mid p \sim \operatorname{Binomial}(n, p) \\ \Pr[X = x \mid p] = \binom{n}{x} p^x (1-p)^{n-x}, \quad x \in \{0, 1, 2, \ldots, n\}$$ is a binomial random variable that represents the number of black balls drawn from the urn in $n$ trials, and $$p \sim \operatorname{Beta}(a,b) \\ f(p) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} p^{a-1} (1-p)^{b-1}, \quad p \in [0,1]$$ is the prior distribution of $p$ with hyperparameters $a, b$, then it follows that if we observe $X = x$ black balls drawn, the posterior for $p$ will be beta with posterior hyperparameters $$a^* = a + x, \quad b^* = b + n - x.$$

In your case, when we have no belief about the "true" value of $p$, one choice of prior is the uniform prior: i.e., all values $p \in [0,1]$ are equally likely, so $p$ has the uniform distribution with density $$f(p) = 1, \quad p \in [0,1],$$ which corresponds to the choice $a = b = 1$. Then if we draw $6$ balls and observe $5$ black and $1$ white (with replacement), the posterior for $p$ will be beta with posterior hyperparameters $a^* = 1 + 5 = 6, \quad b^* = 1 + 6 - 5 = 2$, so the posterior density is $$f(p \mid X = 5) = \frac{\Gamma(6+2)}{\Gamma(6)\Gamma(2)} p^{6-1} (1-p)^{2-1} = 42 p^5 (1-p).$$ The posterior mean is $$\operatorname{E}[p \mid X = 5] = \frac{a^*}{a^* + b^*} = \frac{6}{6+2} = \frac{3}{4}.$$ The posterior mode is $$\operatorname{Mode}[p \mid X = 5] = \frac{a^* - 1}{a^* + b^* - 2} = \frac{5}{6}.$$ We always keep in mind that we are dealing with probability distributions, not single numerical values.

However, there is no reason to say that the prior for $p$ must be uniform. There are other justifiable choices, such as the Jeffreys prior corresponding to $a = b = 1/2$, or the improper prior $a = b = 0$. A discussion of these is beyond the scope of your question, but you are welcome to read the relevant literature.