Probability of collecting all 5 buying 7 chocolates
I'm struggling to find where I made a mistake on the way to solving the following problem.
Problem description: A grocery sells chocolates bars. There are $5$ kinds of stickers. Each chocolate is sold with $1$ sticker of one of these $5$ kinds. The probability to find any kind of stickers in any chocolate bar is the same.
What's the probability of collecting all $5$ types of stickers if you buy $7$ chocolates at once?
My solution: since the order of chocolate bars that I've bought is irrelevant, the total number of 7-chocolate bar sets that I can buy equals
$$\left(\binom{5}{7}\right)=\binom{7+5-1}{7}=\binom{11}{7}=330$$
Now I'm solving the reverse problem: let's calculate the probability of failing to collect all $5$ kinds of stickers. To do that I need to calculate the number of 7-chocolate bar sets that have less than $5$ kinds of stickers, which means that I need sets with only $1$ kind of stickers, $2$, $3$ and $4$. Since, again, the order of chocolate bars doesn't matter, the number of such sets is
$$\left(\binom{4}{7}\right)=\binom{7+4-1}{7}=\binom{10}{7}=120$$
Finally, my answer to the initial problem should be
$P = 1 - \frac{120}{330}= 0.6363636364$
The answer key says it's $\approx 0.215$
Where's the flaw in my solution?
I appreciate any help.
The flaw in your solution is that the $330$ cases you enumerated are not equally likely to appear. For instance, there is only one way for all seven chocolate bars to show the first label. However, if we list the stickers in the order in which we look at the chocolate bars we collect, there are $\binom{7}{2}\binom{5}{2}3!$ ways to obtain a collection with two stickers of the first type, two stickers of the second type, and one of each of the other types.
There are five possible stickers for each of the seven chocolate bars, so there are $5^7$ ways to distribute stickers to chocolate bars.
For the favorable cases, we must subtract those distributions in which not all five kinds of stickers appear in the collection of purchased chocolate bars.
There are $\binom{5}{k}$ ways to exclude $k$ of the $5$ stickers and $(5 - k)^7$ ways to distribute stickers of the remaining $5 - k$ kinds to the chocolate bars. Hence, by the Inclusion-Exclusion Principle, the number of ways the stickers may be distributed to the seven chocolate bars so that all five kinds of stickers appear is $$\sum_{k = 0}^{5} (-1)^k\binom{5}{k}(5 - k)^7 = 5^7 - \binom{5}{1}4^7 + \binom{5}{2}3^7 - \binom{5}{3}2^7 + \binom{5}{4}1^7 - \binom{5}{5}0^7$$ Thus, the probability that all five kinds of stickers appear on the collection of seven chocolate bars is $$\frac{1}{5^7}\left[5^7 - \binom{5}{1}4^7 + \binom{5}{2}3^7 - \binom{5}{3}2^7 + \binom{5}{4}1^7 - \binom{5}{5}0^7\right]$$
This is rather long, but correct solution. Your sample space are 7 chocolate bars with stickers attached to them. Probability of an elementary sequence is $(1/5)^7$. Let's count favorable sequences. If you have 5 different bars, you may have the following possibilities: 3 kinds of a specific sticker and 4 remaining unique stickers, or 2 kinds of 2 stickers plus 3 remaining unique stickers. Let's now count! There are 5 different stickers, ${7 \choose 3}$ ways to select positions for a sticker of the same kind, and 4! ways to arrange remaining unique stickers in alternative orders (remember that we are working with elementary sequences), giving the first summand: $$5{7 \choose 3}4! = 4200$$ There are also ${7 \choose 2}$ ways to select positions for a first 2-repeats sticker and ${5 \choose 2}$ ways to select positions for a second 2-repeats sticker. Now, once positions are selected, we have ${5 \choose 2}$ ways to select two stickers to fill these 2-repeat positions, and 3! ways to arrange remaining unique stickers. The second summand, therefore, is: $${7 \choose 2}{5 \choose 2}{5 \choose 2}3! = 12600$$ Finally, together we have: $$(1/5)^7(12600 + 4200) = 0.21504$$
I could not remember any of my combinatorial mathematics, so I did a brute force method. My apologies to the real mathematicians out there.
Let's name the stickers with lower case letters: a, b, c, d, e.
Each element in our sample set of equally likely outcomes is a 7-tuple of these letters.
- Example: [abcdeab]
Order matters! [aaaaaab] is different than, but equally likely as, [baaaaaa] even though they contain the same number of each type of sticker. So there are five options for the first chocolate bar, five options for the second chocolate bar, etc.
That means there are 5*5*5*5*5*5*5 = 78125 equally likely outcomes.
How many of these outcomes contain all five stickers? Let's take one of each sticker and find out how many ways we can place them on five different chocolate bars.
- I have seven options of where to place A.
- I have six remaining options of where to place B.
- I have five remaining options of where to place C.
- I have four remaining options of where to place D.
- I have three remaining options of where to place E.
That gives me 7*6*5*4*3 = 2520 combinations for placing 5 stickers. Let's call each option a "template".
- Template Example: [dXcbaYe] where X and Y have not yet been given stickers.
I can put any sticker I want on X or Y, and as there are 5 sticker choices, I have 5*5 or 25 variants for each template.
At this point, I might be tempted to say that we have 2520 possible templates each with 25 variants for a total of 2520*25 = 63,000 favorable outcomes. But that would be wrong, because I am counting outcomes multiple times.
- Example: These two templates, [abcdeXY] and [XXcdeab], can both produce the same variant [abcdeab].
So let's calculate duplication and eliminate it.
If X and Y get assigned the same sticker, then the outcome will appear as a variant of three separate templates. For example, the outcome [abcdeaa] is a variant of the following three templates:
- [abcdeXX]
- [XbcdeaX]
- [XbcdeXa]
If X and Y get assigned different stickers, then the outcome will appear as a variant of four separate templates. For example, the outcome [abcdeab] is a variant of the following four templates:
- [abcdeXY]
- [XbcdeaY]
- [aXcdeYb]
- [XYcdeab]
So, I have 2520 possible templates. For each template, I have 25 variants, of which 5 (the XX variants - aa, bb, cc, dd, ee) will be counted three times each and the other 20 (the XY variants) will be counted four times each. So the total number of options is:
- XX variant count - (2520 * 5 / 3) = 4200
- XY variant count - (2520 * 20 / 4) = 12600
- All variants = 4200 + 12600 = 16800
So the probability is 16800 / 78125 = 21.504%