What is a mathematically rigorous justification for multiplying edge probabilities of a tree diagram
Solution 1:
You are right that $(A\cap A\cap B)$ is a nonsensical way to write the event $(A,A,B)$ in the tree diagram you presented. After all, $A\cap A\cap B = B\cap A\cap A = A\cap B,$ so it's really unclear which of the diagram's outcomes is being expressed.
In your proposed notation $(A_1\cup A_2\cup B_3),$ you fix one of the problems in the previous notation, namely that we cannot tell whether $A$ indicates taking branch $A$ at the first node or at the second node. But before we jump to a notation combining $A_1,$ $A_2,$ and $B_3,$ it pays to examine what each of the symbols $A_1,$ $A_2,$ and $B_3$ means.
A good model of probability is that all events are sets. In order to not get confused by the multiple ways the letter $A$ was used in the diagram, I'm just going to number the leaves of the tree with the numbers $1,2,3,\ldots, 12$ in sequence from the top to the bottom of the diagram, so that for example leaf $4$ is the one labeled $(A,C,B)$ in the diagram. Then $\{4\}$ is an event (namely, the event that that unique outcome occurs), but $\{1,2\}$ is also an event and so is $\{2,5,6,11\}$ or any other arbitrary subset of the twelve unique outcomes.
I would then understand $A_1$ to be the event in which the car's location is $A,$ with no other restriction on the player's choice or the door that is opened. That is, $A_1 = \{1,2,3,4\},$ all the outcomes you can get to by following the first branch. But $B_3$ is the event in which the door revealed is $B,$ namely, $B_3 = \{1,4,9,12\},$ that is, the set containing any outcome you can get to by following a path whose third step is labeled $B.$ And $A_2$ is the event in which the player chooses $A,$ that is, $A_2 = \{1,2,5,9\}.$
A good interpretation of the notation $A_1\cup A_2\cup B_3$ then is the union of those three sets I just described, namely, $A_1\cup A_2\cup B_3 = \{1,2,3,4,5,9,12\}.$ This is a legitimate event, but I doubt it is what you were looking for.
But let's try taking the intersection of $A_1 = \{1,2,3,4\},$ $A_2 = \{1,2,5,9\},$ and $B_3 = \{1,4,9,12\}.$ There is only one element that is in all three of those sets; $A_1\cap A_2\cap B_3 = \{1\}.$
That is why it makes sense to use set intersection to denote unique outcomes of this tree. Intersections reduce the size of the event, eventually narrowing it down to a very precise outcome.
As to why we would take the product of the weights of edges of the tree, if each edge leading from the root node is correctly labeled with a numeric weight then that weight is the probability that this particular edge leads to the outcome that occurred. In the diagram in the question, for example, we could (and should) assign weight $\frac13$ to edge $A$ under "car location," since at the start of the game we have no reason to think the car is more likely to be placed behind one door than any other; that is, we set $P(A_1) = \frac13.$
Now consider the topmost edge under "player's initial guess." We can traverse this edge only if event $A_1$ occurs (allowing us to arrive at the node where this edge starts) and event $A_2$ occurs. If the weight of this edge also is $\frac13,$ that signifies that we will take this edge in $\frac13$ of all cases where we arrive at the node where this edge starts (as measured by our probability measure), which in turn happens in $\frac13$ of all cases of the game (by our probability measure). One-third of one-third of a total is one-ninth of the total, that is, the probability of traversing both edges $A_1$ and $A_2$ is $\frac13\times\frac13=\frac19.$
(The second $\frac13$ in that product derives from a questionable assumption, but let's finish at least one probability analysis before tackling that point.)
As you traverse more edges, each edge you traverse keeps only some proportion $p$ of the probability with which you arrived at the start of that edge, so we multiply the probability of reaching the start of that edge by the probability of taking that edge, and we get the probability of arriving at the end of that edge. Continuing this procedure all the way to a leaf, we end up with a probability of reaching that leaf which is the product of all the probabilities assigned to the edges we traversed.
This does not mean that $P(A_1 \cap A_2) = P(A_1) P(A_2),$ by the way. That equation will only be true if the probability of $A_2$ is independent of the probability of $A_1.$ It's reasonable in this example to assume independence, because the player's decision is not influenced by the actual location of the car, but it's not true for ever path in every probability tree. What is always true is that $$P(A_1 \cap A_2) = P(A_1) P(A_2 \mid A_1),$$ where $P(A_2 \mid A_1)$ is the probability that $A_2$ occurs given that $A_1$ occurs. So the weight of that topmost edge really needs to be $P(A_2 \mid A_1),$ but we tend to use probability trees for problems where it's fairly easy to figure out what that probability should be, so this is usually not difficult.
Consider the topmost edge under "door revealed," for example. We can only get to the start of that edge if $A_1 \cap A_2$ has already occurred. In the standard form of Monty Hall, the host can only reveal door $B$ or door $C$ at this point, and chooses either with equal probability, so we assign this edge the weight $P(B_3 \mid A_1 \cap A_2) = \frac12.$ But on the other hand, consider the fourth edge from the top, which also is labeled $B.$ That edge can be traversed only if we first traverse $A_1$ and $C_2,$ at which point the host is required to reveal door $B.$ We therefore assign to this edge the weight $P(B_3 \mid A_1 \cap C_2) = 1.$
Now this is all fine and well, except that in my opinion there is some question about how we should assign weights to the edges under "player's initial guess." Is it really plausible to assign a random distribution to the player's guess? After all, the Monty Hall problem is about using probability to make intelligent guesses. The player could choose to guess any door with $\frac13$ probability, but he or she could instead choose to guess $A$ with probability $1$. What we can reasonably say is that the player must have some method of deciding which door to guess, and that method selects doors $A$, $B$, and $C$ with probability $p_A$, $p_B$, and $p_C,$ where $p_A+p_B+p_C=1.$ We can still take products of probabilities along edges; these products will just have an unknown factor now. If we then find the outcomes belonging to the event "the player's initial guess was correct," and compute the probability that at least one of those outcomes occurred, we will find that that probability is $$\frac13(p_A+p_B+p_C)=\frac13,$$ the same as in the MIT 6.042 course notes.