Combination for subset with duplicates
I think my problem should be able to be solved with combination of multisets, but for some reason I do not get the right solution.
Example:
- My Number List:(1,1,1,1,1,2) = (5*1, 1*2)
- Example of a number of 3 combinations
- Solution = 2 = (1,1,1) (1,1,2) as it is combination the order does not matter
If I am right (which I am sure I am not) I should use: $n=2$ $k=3$
$\binom{n+k-1}{k} = \binom{4}{3} = \frac{4!}{3!1!} = 4$ which is wrong.
I saw a few different solutions for the same problem which does not seems right for me, and there are some definitely wrong solutions. Is there a generic formula?
Edit 1:
After JMoravitz suggestion I spent a bit of time looking into the inclusion exclusion theorem. So these are my assumptions now:
Say I build a table for numbers I can use for simplification:
\begin{array}{|c|c|c|c| C | } \hline 1& 2 & 3 & 4 & list\\ \hline 5 &1 & & &1,1,1,1,1,2\\ \hline 4 &2 & & &1,1,1,1,2,2\\ \hline 4 &1 &1 & &1,1,1,1,2,3\\ \hline 2 &2 &2 & &1,1,2,2,3,3\\ \hline \end{array}
In all the cases $k = 3$ I assume at the first and second problem I should use the same calculation: $|A ∪ B| = |A| + |B| - |A ∩ B| $
The third and forth I assume should be this: $|A ∪ B ∪ C| = |A| + |B| + |C| - |A ∩ B| - |A ∩ C| - |B ∩ C| + |A ∩ B ∩ C|$
So in my head the first addition is always all the possibilities which should be calculated from the model above. I am wondering what would be $k$ and $n$ in the rest of the calculation like $ |A ∩ B ∩ C|$.
As per my suggestion to using inclusion-exclusion and your request for more information on the specific method I proposed, the events to use it on would be the event that we use too many $1$'s, the event that we used too many $2$'s, etc... Call these $A_1,A_2,A_3,\dots$. Call the set of combinations where we have an unlimited amount of each available $S$. The final total will be:
$$|S|-|A_1|-|A_2|-\dots+|A_1\cap A_2|+|A_1\cap A_3|+\dots-|A_1\cap A_2\cap A_3|-\dots\pm|A_1\cap\dots\cap A_n|$$
where we alternately subtract or add the intersections of one, two, three, and so on many events respectively.
Looking at a specific problem for now: How many $4$-combinations there are of the multiset $\{1,1,1,2,2,3,4,5\}$.
If we had an unlimited amount of each, it would be where $n=5,k=4$ and so total amount of combinations are $\binom{5+4-1}{4}=70$.
If we had used too many $1$'s, that would mean that we used strictly more than three $1$'s which implies we used four ones. We have zero remaining spots in our combination, so $|A_1|=1$
If we had used too many $2$'s, that would mean we used three or more $2$'s. Let us go ahead and use that many and count how many ways there are to fill the rest of the combination, again remembering that we are ignoring upper bounds for now. Here we would have $n=5,k=1$ so we have $\binom{5+1-1}{1}=5=|A_2|$
Similarly we calculate $|A_3|$ and $|A_4|$ and $|A_5|$ to be $\binom{5+2-1}{2}=15$. We aren't done yet though.
We continue and try to calculate $|A_1\cap A_2|,|A_1\cap A_3|,|A_1\cap A_4|,\dots,|A_1\cap A_2\cap A_3\cap A_4\cap A_5|$. Thankfully, the majority of these will be zero as it is impossible for us to have simultaneously taken too many of multiple numbers, but it is possible to have taken too many $3$'s and $4$'s or simultaneously taken too many $3$'s and $5$'s etc..., which would correspond to $|A_3\cap A_4|$ and $|A_3\cap A_5|$ etc... The only way to have taken too many $3$'s and $4$'s would be if we took two $3$'s and two $4$'s.
We have the final total being then $70-1-5-15-15-15+1+1+1=22$
Writing a fully generic formula will be incredibly wordy, but if you have $c_1,c_2,c_3,\dots$ amounts of $1$,$2$,$3$,... available respectively, you are choosing $k$ total and there are $n$ numbers available it will be something like this:
$$\sum\limits_{i=0}^n\left((-1)^i\sum\limits_{\Delta\subseteq [n]~:~|\Delta|=i}\binom{n+k-1-\sum\limits_{j\in\Delta}(c_j+1)}{n-1}\right)$$
Here, I take $[n]$ to mean $\{1,2,3,\dots,n\}$ as opposed to $\{0,1,2,\dots,n-1\}$ for convenience. It is worth pointing out the link to Sigma summation notation on wiki to remind you that you can notate in ways other than just running through the values from a start to a stop like in $\sum\limits_{i=0}^n$ and instead range over a set of values or range over a set of sets, etc...