how does expectation maximization work in coin flipping problem

joriki:

Thanks for your answer regarding the expectation maximization work relating to the coin flipping problem ( how does expectation maximization work? ).

You have explained how the probability of coin A or coin B are selected for each set of observations.

For example, you have told us how to derive the 0.8,0.2 given the current bias estimates θA=0.6 and θB=0.5 in the 2nd row.

Can you help me understand how how do we go from that probability distribution (0.8, 0.2) to the expectation (7.2H, 0.8T) for coin A in the 2nd row as well. Same as the question asked by Martin, the total number of tosses should be 10 rather than 7.2H + 0.8T = 8?

Many thanks here.


Thanks for the personalized question -- I don't think I ever got one of those before :-)

The answer lies in this part of the tutorial linked to in Martin's question:

Rather than picking the single most likely completion of the missing coin assignments on each iteration, the expectation maximization algorithm computes probabilities for each possible completion of the missing data, using the current parameters $\hat\theta(t)$. These probabilities are used to create a weighted training set consisting of all possible completions of the data. Finally, a modified version of maximum likelihood estimation that deals with weighted training examples provides new parameter estimates, $\hat\theta(t+1)$. By using weighted training examples rather than choosing the single best completion, the expectation maximization algorithm accounts for the confidence of the model in each completion of the data (Fig. 1b).

So the $7.2$ tails and $0.8$ heads shouldn't add up to $10$ by themselves; they are weighted contributions to the actual result $(9,1)$, weighted in proportion to the probabilities that this result would occur given the current bias estimates. Since we calculated that proportion to be $0.8:0.2$, a contribution of $0.8\cdot(9,1)=(7.2,0.8)$ is added to the column for coin A and a contribution of $0.2\cdot(9,1)=(1.8,0.2)$ is added to the column for coin B. Together, they add up to $(9,1)$ (since we obtained the weights by normalizing their sum to $1$). Thus, the more likely it seems, according to the current bias estimates, that this row was produced by coin A, the more of it we add to the column for coin A.

Note that we're not calculating an expectation value in the columns; we're merely adding up fractions of heads and tails in proportion to the likelihood that they came from this coin, and in the end we take the overall ratio of heads and tails to get a new bias estimate; there's no need for the heads and tails to add up to anything or to form an expectation value in either of the columns individually.

I hope that makes it clearer; if not, you may have to explain in more detail why you think those values should add up to $10$.