How many distinct n-letter "words" can be formed from a set of k letters where some of the letters are repeated?
Solution 1:
There is a fairly systematic way to work this type of question. You were specifically wondering about:
How many 4-letter words can be formed from the letters: ABBCCC?
First, you write the 'source partition' for your word:
[C,B,A]
[3,2,1] <-- this is the `source partition`
Note that the source partition provides for 1 triple, 1 double, and 1 single. Corresponding to that, you write every possible 'target partition' of 4-letters.
[3,1] requests one triple and 1 single
[2,2] requests 2 doubles
[2,1,1] requests one double and 2 singles
For each 'target partition', you write an expression that gives the number of ways that the given target type can occur. An example of the target type [3,1] is:
CCBC # type [3,1]
The expression for that type is:
nCr(1,1)*nCr(2,1) * fac(4)/fac(3)
'nCr( n, r)' is the function for 'combinations', which gives the number of ways you can make a unique selection from n distinct things, taking r at a time. 'fac( n)' is the function for the factorial of n.
Note that source [3,2,1] provides 1 triple, and target [3,1] requests 1 triple. Hence nCr(1,1)
. After the triple is used up, the source [3,2,1] can provide 2 singles, and the target [3,1] requests 1 single. Hence nCr(2,1)
The call to fac(4)
, in each expression, always corresponds to the 4-letter word. Any division, if it occurs in an expression, corresponds to each multiple request of the corresponding target partition. That's all there is to the method, but it isn't always easy to get every last detail correct. The entire computation, as I did it in the programming language Python, follows:
# The `source partition` for the word 'ABBCCC' is:
# [3,2,1]
# # target
w = 0 # ------
w += nCr(1,1)*nCr(2,1) * fac(4)/fac(3) # [3,1]
w += nCr(2,2) * fac(4)/(fac(2)*fac(2)) # [2,2]
w += nCr(2,1)*nCr(2,2) * fac(4)/fac(2) # [2,1,1]
#
# The answer is 38
Solution 2:
Let's say you want the word of letters to be $M$-long, with $N$ choices for letters in each space on the word. Then draw something like this:
to enumerate all possible words. In the picture $M=3$ and $N=2$. Let's say you have a deficit in $A$s of magnitude $D_A$ from being able to make a full, $M$-long word of $A$s.
First, assuming none of the deficits overlap (that is, there are no words that violate 2 or more restrictions in the number of letters), find the number of nodes that are less than or equal to $M-D_A$ away from $(A,A,...A)$. This turns out to be $\sum_{k=0}^{k=M-D_A}\binom{M}{k}{(N-1)}^{M-k}$.
For the proof of this, think of a 3-letter word with choices of letter $A$, $B$ or $C$. There are $\binom{3}{0}2^0$ ways of making words with 3 $A$s, $\binom{3}{1}2^1$ ways of making a word with 2 $A$s, etc.
So evidently you need to subtract each sum taking into account the restrictions of each letter from the total number of unrestricted words.
However, it's much more complicated when the sum of two deficits is more than $M$ so that because of the overlap the subtraction will have been too great. Basically the $(N-1)$ in the sum will be some lower number for higher values of $k$ when other letter's restrictions are overstepped, but I can't see a nice formula for it for a given $D_B, D_C...$.