Fast way to get a position of combination (without repetitions)
This question has an inverse: (Fast way to) Get a combination given its position in (reverse-)lexicographic order
What would be the most efficient way to translate
a combination of $k$-tuple into its positions within the $\left(\!\!\binom{n}{k}\!\!\right)$ combinations?
I need this to be fast for combinations of $\left(\!\!\binom{70}{7}\!\!\right)$ order of magnitude - very large, but not exceeding 2 billion (fits into int32
maximum value).
Below is an example of $\left(\!\!\binom{6}{3}\!\!\right)$, where the aim is to quickly translate (a, d, f)
tuple to value 9
, while in the real problem $k$ ranges between 5 and 8.
$$\begin{array} {cccccc|c} a&b&c&d&e&f&^{combination}/_{sort\_order}& \\\hline x&x&x& & & &1\\ x&x& &x& & &2\\ x&x& & &x& &3\\ x&x& & & &x&4\\ x& &x&x& & &5\\ x& &x& &x& &6\\ x& &x& & &x&7\\ x& & &x&x& &8\\ x& & &x& &x&9\\ x& & & &x&x&10\\ .&.&.&.&.&.&.\\ & & &x&x&x&20\\ \end{array}$$
I know that I could pre-calculate all the combinations and reverse the lookup dictionary. However, such dictionary would not be efficient in terms of memory usage. Therefore I am looking for either calculation-based approach, or a more efficient data structure to perform this mapping.
Let us denote your tuple [a b c] as [1 1 1 0 0 0] and so on.
Define $\binom{n}{r}=0$ for $n<r$
For your tuple: $[a d f] = [1 0 0 1 0 1]$ $$P = 1\cdot \binom{0}{1}+0\cdot \binom{1}{1}+0\cdot \binom{2}{1}+1\cdot \binom{3}{2}+0\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +3+0+10+0+1 = 14$$
General Algorithm:
- Calculate the position value of each binary digit using $\binom{n}{r}$
- Take $n$ as position of the digit from left, for leftmost digit $n=0$.
- Write $r$ for each position as the number of 'ONES' counted from left, including the one at current position.
Example-1: [a b c] = [1 1 1 0 0 0]
Calculate the position of the tuple as sum: $$P = 1\cdot \binom{0}{1}+1\cdot \binom{1}{2}+1\cdot \binom{2}{3}+0\cdot \binom{3}{3}+0\cdot\binom{4}{3}+0\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +0+0+0+0+1 = 1$$
Example-2: [d e f] = [0 0 0 1 1 1] $$P = 0\cdot \binom{0}{0}+0\cdot \binom{1}{0}+0\cdot \binom{2}{0}+1\cdot \binom{3}{1}+1\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$S=0+0+0+3+6+10+1=20$$
The lone ONE is added because you are not starting at zero.
I'll relabel your (a,d,f)
to (1,4,6)
and denote it by $(i_1,i_2,i_3)$ so we can calculate with it.
Start at index $\binom nk$. Moving the $k$-th entry to the left by $1$ reduces the index by $1=\binom{n-j}0$. Moving the $(k-1)$-th entry to the left from $j$ to $j-1$ reduces the index by $n-j=\binom{n-j}1$. Generally, moving the $(k-m)$-th entry to the left from $j$ to $j-1$ reduces the index by $\binom{n-j}m$. Thus the index is
$$ \binom nk-\sum_{m=0}^{k-1}\;\sum_{j=i_{k-m}+1}^{n-m}\binom{n-j}m\;. $$
You can easily precalculate the inner sums so that you can look them up using $m$ and $i_{k-m}$, so you just need to add up the $k$ terms in the sum over $m$ to get the index.
P.S.: That was unnecessarily complicated; the inner sum simplifies, and the result is
$$ \binom nk-\sum_{m=0}^{k-1}\binom{n-i_{k-m}}{m+1}\;. $$
You can probably derive that more directly, but since you're perhaps just interested in a practical result and not the most elegant way of deriving it, I'll leave it at that.
Your tuple ordering is lexicographic and your to-be-computed position is one-based, as are the symbol codes for $a,b,\ldots$ used in @joriki's answer; but for the sake of simplicity I will use reverse-lexicographic ordering and zero-based positions and letter codes. Conversion is done by replacing @joriki's $(i_1,\ldots,i_k)$ with $(n-i_k,\ldots,n-i_1)$ and replacing the position resulting from my formula with its distance to $\binom{n}{k}$. The result below is thus consistent with @joriki's formula.
I have used such computations for compression of multi-indices into (skew-)symmetric tensors; therefore I borrow some vocabulary from that domain.
Let us define a $k$-index to be a $k$-tuple of strictly increasing nonnegative integers. You may have to sort accordingly and to disallow duplicate entries.
$k$-indices can be totally ordered in a reverse-lexicographic manner: Sorting is done by the last element, in case of equality by the next-to-last element, and so on.
For a $k$-index $I$, let us define its position (or compressed index) $\operatorname{ordx}(I)$ as the number of $k$-indices that are reverse-lexicographically smaller than $I$.
Note that $\operatorname{ordx}(I) = 0$ for the smallest $k$-index $I=(0,\ldots,k-1)$.
We need a notation for truncated tuples. Denoting $I = (i_0,\ldots,i_{k-1})$, let $I_m = (i_0,\ldots,i_{k-m-1})$ for integer $m$ with $0\leq m<k$. That is $I$ with the last $m$ elements chopped off.
Now a $k$-index $J=(j_0,\ldots,j_{k-1})$ is reverse-lexicographically smaller than $I$ if and only if
- $j_{k-1} < i_{k-1}$; there are $\binom{i_{k-1}}{k}$ such $k$-indices; or
- $k>1$, and $j_{k-1} = i_{k-1}$, and $J_1$ is reverse-lexicographically smaller than $I_1$. This condition involves a comparison of $(k-1)$-indices.
Proceeding by induction, we arrive at the formula
$$\operatorname{ordx}(I) = \sum_{r=1}^k\binom{i_{r-1}}{r}$$
It is worth noting that this formula does not depend on the upper bound $n$ for the index elements.
The binomial coefficient values can be computed on the fly by initializing and updating a segment of Pascal's triangle. Define $$b_{j}^{(r)} = \binom{j+r}{r} = \begin{cases} 1 & \text{if $r=0$ or $j=0$} \\ b_{j}^{(r-1)} + b_{j-1}^{(r)} & \text{if $r>0$ and $j>0$} \end{cases}$$ So we just need to initialize and update an array $$B^{(r)} = \left(b_0^{(r)},\ldots,b_{i_{k-1}-k}^{(r)}\right)$$ In practice, we prepend a $0$ to that array in order to account for the case $i_{r-1} = r-1$ which requires a zero binomial coefficient.
In Python (which uses zero-based indices and half-open ranges):
def ordx(idx):
"""
Turns a multi-index of strictly increasing nonnegative integers
into a 1-dimensional zero-based index.
"""
s = 0
b = [0] + [1] * (idx[-1] + 1 - len(idx)) # [0, 1, 1, ...]
for r,i in enumerate(idx): # (0,idx[0]), (1,idx[1]), ...
for j in xrange(2, len(b)): # 2, ..., len(b)-1
b[j] += b[j-1] # binomial(j+r, r+1)
s += b[i - r]
return s
Besides:
If you want to allow duplicate tuple elements, you can transform that
problem by adding to each element $i_r$ the sub-index $r$ and
computing ordx
for the modified tuple which now has strict increments.
For that use case, the code above gets simplified a bit:
def ords(idx):
"""
Turns a multi-index of nondecreasing nonnegative integers
into a 1-dimensional zero-based index.
"""
s = 0
b = [0] + [1] * idx[-1] # [0, 1, 1, ..., 1]
for i in idx:
for j in xrange(2, len(b)): # 2, ..., len(b)-1 = idx[-1]
b[j] += b[j-1]
s += b[i]
return s
This computes
$$\operatorname{ords}(I) = \sum_{r=0}^{k-1}\binom{i_r + r}{r+1}$$
Such a function ords
could be used for compressing a sorted multi-index
for totally symmetric tensors to a flat index that removes redundancy.
Update:
The above algorithms are simple, but need to update an array of binomial
coefficients for each index element.
Consequently, running the above ords
needs $k\,i_{k-1}$ additions
and an array b
of length $i_{k-1}+1$.
For ordx
replace $i_{k-1}$ with $i_{k-1}-k+1$.
We can reduce the operation count and memory usage by computing
each needed binomial coefficient directly from the previous one.
This requires a sequence of multiplications and divisions instead
of just additions, but it reduces binomial bookkeeping to one scalar variable
and keeps total ords
operation count at $\operatorname{O}(k+i_{k-1})$.
Correspondingly, ordx
operation count is $\operatorname{O}(i_{k-1})$.
Here is a sample Python implementation of the optimized $\operatorname{ordx}$
(with //
denoting integer division):
def ordx_opt(idx):
"""
Turns a multi-index of strictly increasing nonnegative integers
into a 1-dimensional zero-based index.
"""
s = 0
j = 1
b = 1
for r,i in enumerate(idx): # (0,idx[0]), (1,idx[1]), ...
if i == r: continue # skipping terms with zero binomial coeff
# b == binomial(j+r-1, r), update to j == i - r
while j < i - r:
b *= j + r
b //= j
j += 1
# b == binomial(i-1, r), update to binomial(i, r+1)
b *= i
b //= r + 1
s += b
return s
And the corresponding optimized $\operatorname{ords}$:
def ords_opt(idx):
"""
Turns a multi-index of nondecreasing nonnegative integers
into a 1-dimensional zero-based index.
"""
s = 0
j = 1
b = 1
for r,i in enumerate(idx):
if i == 0: continue # skipping terms with zero binomial coeff
# b == binomial(j+r-1, r), update to j == i
while j < i:
b *= j + r
b //= j
j += 1
# b == binomial(i+r-1, r), update to binomial(i+r, r+1)
b *= i + r
b //= r + 1
s += b
return s
I posted the same question and was pointed here. I was able to use joriki's solution, slightly tweaked, and came up with the following C# code:
public int GetCombID(List<int> comb, int max = 51)
{
UInt64 id = nCr(max + 1, comb.Count);
for (int i = 0; i < comb.Count; i++)
id -= nCr(max - comb[i], comb.Count - i);
return (int)id;
}
To reverse the process I use the following:
public List<int> GetCombFromID(int id, int combLength = 2, int max = 51)
{
List<int> comb = new List<int>(combLength);
var tId = nCr(max + 1, combLength) - (UInt64)id;
for(int i = combLength; i > 0; i--)
{
UInt64 tVal = 0;
bool done = false;
int pos = 0;
while(!done)
{
var t = nCr(max - pos, i);
if (t <= tId)
{
tVal = t;
done = true;
}
pos++;
}
tId -= tVal;
comb.Add(pos - 1);
}
return comb;
}
I post this now as a reference for others in the future.