Guessing the probability by results of just 1 experiment

I have a probability question that seems easy, but I somehow can't wrap my head around it.

Suppose we have a coin. Probability that coin toss will come out heads is some unknown value X. First toss came out as heads. What would you be your best guess about the value X (so, if you guess is y, your task is to minimize $ |X - y| $)?

For me it seems like given the result of the first experiment, coin is just a little bit more likely to be loaded in a way that heads come out more often, so optimal guess about the likelihood of heads is 1. But I can't formulate it in a proper way or prove it mathematically. Besides, there is an opinion in other (non-math) online community that probability 0.5 would be more likely. I think there is a flow somewhere in my logic.

Can you help me to understand this concept? Thanks.

Update: for anyone interested, the question originally emerged during the discussion of Hindsight bias phenomenon. More precisely, the result of Fischhoff and Beyth experiment seems to be logically correct since differences in the results of predictions were caused by the differences in the information given to the groups. Even if the students were explicitly asked not to consider the result of conflicts as the probability factor, the only thing that experiment states is that we can't throw things out from our subconscious perception of the world at will (and that is obvious from the definition of the subconsciousness itself). So the phenomenon of hindsight bias can not be tested through such experiment or any one alike. The experiment should show difference between mathematical probability and empirical probability given the same initial data.


Solution 1:

Let's do this using Bayesian statistics. Let $p_0$ be the probability distribution over the interval $[0,1]$ describing our initial belief in the likelihood of various values of the unknown parameter $X$. We wish to update this distribution based on the outcome of an experiment in which the coin is tossed and comes up heads with probability $X$.

The conditional probability $\mathrm P(\mathrm{heads} \mid X=x)$ of the coin coming up heads, given a certain value $x$ of $X$, is simply equal to $x$. Thus, by Bayes' rule, the posterior probability distribution for $X$, given that we do observe the coin coming up heads, is given by $$p(x) = \mathrm P(X=x \mid \mathrm{heads}) = \mathrm P(\mathrm{heads} \mid X=x) \frac{\mathrm P(X=x)}{\mathrm P(\mathrm{heads})} = x \frac{p_0(x)}{C} = x p_0(x) / C,$$

where the normalizing factor $$C = \mathrm P(\mathrm{heads}) = \int_0^1 \mathrm P(\mathrm{heads} \mid X=x)\,\mathrm P(X=x)\,dx = \int_0^1 x p_0(x) \,dx$$ just scales the distribution so that the total probability mass remains one.

(Note that I'm abusing notation a bit here by treating distributions as if they were functions and blithely conditioning on probability-0 events like $X=x$. All this can be made rigorous, at the cost of introducing some extra complexity, but I won't go into all that here.)

Given a particular prior distribution $p_0$, the posterior distribution $p$ will be fully determined, and we can then obtain an expected value for $X$ by integrating over the distribution $p(x)$ weighted by $x$: $$\mathbb E[X \mid \mathrm{heads}] = \int_0^1 x p(x) \,dx.$$

In particular, if we initially assume every value of $X$ to be equally likely, such that $p_0(x) = 1$, then the a priori probability $C$ of getting heads is simply $\int_0^1 x\,dx = \frac12$, and the posterior distribution is thus $p(x) = x\frac 1C = 2x$, giving us $$\mathbb E[X \mid \mathrm{heads}] = \int_0^1 2x^2 \,dx = \frac23.$$

Indeed, if we start with the flat prior $p_0(x) = 1$ and observe $a$ heads and $b$ tails, the posterior distribution will be the beta distribution $p(x) = x^a(1-x)^b / \int_0^1 x^a(1-x)^b \,dx$, and the expected value of $X$ will be simply $$\mathbb E[X \mid a\text{ heads, }b\text{ tails}] = \frac{\int_0^1 x^{a+1}(1-x)^b \,dx}{\int_0^1 x^a(1-x)^b \,dx} = \frac{a+1}{a+b+2}.$$

This simple formula is exactly the same as the rule of succession formulated by Laplace in the 18th century to address the "sunrise problem", i.e. the task of estimating the probability that the sun will rise tomorrow, given evidence that it has done so every day for at least the past 5000 years. Your problem is exactly the same as Laplace's except that, instead of 5000 years of daily observations, you only have one. Thus, the expected value of $\mathbb E[X] = \frac23$ you get is also relatively close to the prior estimate $\frac12$.

Solution 2:

Edit : Thanks to Aant, I was able to fix my reasonning. Should I have deleted it instead ?

Let us assume that the probability $X$ of the coin flipping heads is a random variable uniform in [0;1] (pdf f(t)=1). Let $H$ be the event that the first flip is Heads.

$$P(H\cap (X\leq x)) = \int_0^x t\times f(t) dt = \frac{x^2}{2}$$

$$P(H) = \int_0^1 t\times f(t)dt = 0.5$$

Therefore :

$$P(X\leq x |H) = \frac{P(H\cap (X\leq x))}{P(H)}=x^2$$

Thus naming $Y$ the random variable $X|H$, and $g$ its probability distribution function, we have $$\int_0^x g(t)dt = x^2$$

And therefore $g(x)=2x$. It is now a matter of minimising $|Y-y|$ for $y$, and that is achieved by $y=E(Y)=\frac{2}{3}$

This could of course be adapted to any distribution other than uniform at the beginning.