Confusion in this probability question

Background:

Just started learning probability, so please bear with me if it's a very trivial question, but this has been troubling me:

Question

A bag contains $5$ green and $4$ white balls. Three balls are drawn simultaneously. Find the probability that all three balls are green.

Solution that I've been taught

Total balls = $9$, Probability = Favourable Cases / Total Cases = $\frac{{5 \choose 3}}{{9 \choose 3}}$ = $\frac{5}{42}$

My approach

Since we have $5$ green and $4$ white balls and we are drawing three balls, the only chances are that we will get: [G stands for Green and W for white],

  1. $3$ G + $0$ W
  2. $2$ G + $1$ W
  3. $1$ G + $2$ W
  4. $0$ G + $3$ W

We only want the case when we get all three green, which is the first case, so according to me the answer should be $\frac{1}{4}$. Why am I wrong? I know I'm doing something fundamentally wrong, please guide me. Why are we even using combinations at all? I would like if I can get an answer that specifically tells where I'm wrong!


Solution 1:

That's a very common problem to have, it crops up here regularly from time to time and can also "trick" people that are more experienced than you in statistics/probability theory.

The main reason your argument is incorrect is that the formula "Favorable cases / Total cases" only works if each "case" is as probable as each other case. That's the scenario where most introduction examples start, but that side condition is not usually stressed in teaching, and (if mentioned at all) usually soon forgotten by the students, because it seems "obvious" when mentioned, but can have subtle consequences, as you experienced here.

So let's first see why your $4$ cases ($3$G + $0$W, a.s.o) are not equally likely. Let's consider a different problem, we have $1000$ balls in that bag, $997$ green balls and $3$ white balls to choose from. Would you think that drawing $3$ green and $0$ white balls has the same probability as drawing $0$ green and $3$ white balls?

If you think they don't, (which is the correct intuition most people have in that very lopsided example), there is then also no reason to think they are exactly equal in the more "even" case of $5$ green and $4$ white balls in the bag.

And that doesn't say anything about how they relate to the other cases of "$2$ green and $1$ white ball drawn", and the color reversed.

So again, the fundamental requirement for "Favorable cases / Total cases" to work is that the cases are actually all equally probable. You can get to absurd conclusions if you forget that, like that there are just two possible outcomes to the event "Then sun will go up sometime in NY City in the next 48h.", so the probability of that event happening (for each possible "now") is just $\frac12$, right?

How to decide if the cases are actually equally probable is something between math and the actual real world events that you try to model. The unspoken assumption on all those "draw out of a bag" questions is that each individual "draw" is getting each remaining ball with equal probability. That's a good model for the real world, if

a) the balls aren't distinguishable by anything for the "drawing process" (be that a person grabbing into it or some machine doing it, like for many lotteries),

b) the balls have been sufficiently mixed initially.

Especially a) is tricker than it sounds, if you want to prevent cheating.

So, the assumption you should be using is that each individual draw of $3$ balls is equally likely. And remembering the $997$ green vs $3$ white balls example, you cannot forget that all those $997$ green balls are different and you cannot just "combine" them, that will give an incorrect result.

So one "case" has to be, when the balls are considered (at least logically) numbered from $1-9$, whith $1-5$ being green and $6-9$ being white:

"I draw balls number $3$ (green), $4$ (green) and $8$ (white)",

and 41 other cases. That naturally leads to the solution you were taught.

Solution 2:

Second Addendum added


Two problems:

  1. The 4th case corresponds to all three balls being white. You want the 1st case, instead, which corresponds to all three balls being green.

  2. The 4 cases are not equiprobable. For example, the only way that Case 1 can occur is "GGG", and since there are $5$ green balls, Case 1 can occur in $(5 \times 4 \times 3)$ ways (i.e. after selecting a Green ball, there are $4$ Green balls left, and so forth). "GGW" can occur in $(5 \times 4 \times 4)$ ways (i.e. there are $4$ choices for the White ball). Since Case 2 represents the 3 possibilities of {"GGW", "GWG", "WGG"}, Case 2 can occur in $(5 \times 4 \times 4 \times 3)$ ways. Therefore, the relative probability of Case 1 occuring versus Case 2 occuring is $(5 \times 4 \times 3)$ versus $(5 \times 4 \times 4 \times 3).$

This explains why your approach is inaccurate. The alternative approach, which is represented by the given solution, is accurate.


Addendum
There may be a point of confusion here.

In point #2 in the first part of my answer, I construed that the order that the balls were positioned was relevant. However, in the offered solution, the order that the selected balls were positioned was not deemed important. This takes some explanation.

First of all, when you compute a probability as $$\frac{N\text{(umerator)}}{D\text{(enominator)}},$$ $N$ and $D$ must be computed in a consistent manner.

Since the question revolved around the probability of "GGG", rather than something like some element in {"GGW", "GWG", "WGG"} the original problem could be attacked by consistently construing that order of placement of the balls is irrelevant with respect to both $N$ and $D$.

To repeat: you have to be consistent when enumerating $N$ and $D$.

In point #2 in the first part of my answer, I had to demonstrate the relative probabilites of Case 1 versus Case 2 occurring. In that situation, there was no way to give an accurate enumeration without construing that the ordering that the balls were positioned was relevant.


Addendum-2
The computation for the probability of Case-1 can be sanity-check-verified as follows:

If you re-examine point #2 in the first part of my answer, you see that the relative probabilities of Case 1 and Case 2 are in proportion of $(1 : 4)$. Using similar methods, the relative probabilities of Case 1 and Case 3 are in proportion of $(1:3)$, while the relative probabilities of Case 1 and Case 4 are in proportion of $(1:[2/5])$.

Therefore, the probability of Case 1 occurring may be computed as

$$\frac{1}{1 + 4 + 3 + [2/5]} = \frac{5}{5 + 20 + 15 + 2} = \frac{5}{42}.$$