Fast way to get a position of combination (without repetitions)

This question has an inverse: (Fast way to) Get a combination given its position in (reverse-)lexicographic order

What would be the most efficient way to translate a combination of $k$-tuple into its positions within the $\left(\!\!\binom{n}{k}\!\!\right)$ combinations?
I need this to be fast for combinations of $\left(\!\!\binom{70}{7}\!\!\right)$ order of magnitude - very large, but not exceeding 2 billion (fits into int32 maximum value).

Below is an example of $\left(\!\!\binom{6}{3}\!\!\right)$, where the aim is to quickly translate (a, d, f) tuple to value 9, while in the real problem $k$ ranges between 5 and 8.

$$\begin{array} {cccccc|c} a&b&c&d&e&f&^{combination}/_{sort\_order}& \\\hline x&x&x& & & &1\\ x&x& &x& & &2\\ x&x& & &x& &3\\ x&x& & & &x&4\\ x& &x&x& & &5\\ x& &x& &x& &6\\ x& &x& & &x&7\\ x& & &x&x& &8\\ x& & &x& &x&9\\ x& & & &x&x&10\\ .&.&.&.&.&.&.\\ & & &x&x&x&20\\ \end{array}$$

I know that I could pre-calculate all the combinations and reverse the lookup dictionary. However, such dictionary would not be efficient in terms of memory usage. Therefore I am looking for either calculation-based approach, or a more efficient data structure to perform this mapping.

Let us denote your tuple [a b c] as [1 1 1 0 0 0] and so on.

Define $\binom{n}{r}=0$ for $n<r$

For your tuple: $[a d f] = [1 0 0 1 0 1]$ $$P = 1\cdot \binom{0}{1}+0\cdot \binom{1}{1}+0\cdot \binom{2}{1}+1\cdot \binom{3}{2}+0\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +3+0+10+0+1 = 14$$

General Algorithm:

Calculate the position value of each binary digit using $\binom{n}{r}$
Take $n$ as position of the digit from left, for leftmost digit $n=0$.
Write $r$ for each position as the number of 'ONES' counted from left, including the one at current position.

Example-1: [a b c] = [1 1 1 0 0 0]

Calculate the position of the tuple as sum: $$P = 1\cdot \binom{0}{1}+1\cdot \binom{1}{2}+1\cdot \binom{2}{3}+0\cdot \binom{3}{3}+0\cdot\binom{4}{3}+0\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +0+0+0+0+1 = 1$$

Example-2: [d e f] = [0 0 0 1 1 1] $$P = 0\cdot \binom{0}{0}+0\cdot \binom{1}{0}+0\cdot \binom{2}{0}+1\cdot \binom{3}{1}+1\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$S=0+0+0+3+6+10+1=20$$

The lone ONE is added because you are not starting at zero.

I'll relabel your (a,d,f) to (1,4,6) and denote it by $(i_1,i_2,i_3)$ so we can calculate with it.

Start at index $\binom nk$. Moving the $k$-th entry to the left by $1$ reduces the index by $1=\binom{n-j}0$. Moving the $(k-1)$-th entry to the left from $j$ to $j-1$ reduces the index by $n-j=\binom{n-j}1$. Generally, moving the $(k-m)$-th entry to the left from $j$ to $j-1$ reduces the index by $\binom{n-j}m$. Thus the index is

$$ \binom nk-\sum_{m=0}^{k-1}\;\sum_{j=i_{k-m}+1}^{n-m}\binom{n-j}m\;. $$

You can easily precalculate the inner sums so that you can look them up using $m$ and $i_{k-m}$, so you just need to add up the $k$ terms in the sum over $m$ to get the index.

P.S.: That was unnecessarily complicated; the inner sum simplifies, and the result is

$$ \binom nk-\sum_{m=0}^{k-1}\binom{n-i_{k-m}}{m+1}\;. $$

You can probably derive that more directly, but since you're perhaps just interested in a practical result and not the most elegant way of deriving it, I'll leave it at that.

Your tuple ordering is lexicographic and your to-be-computed position is one-based, as are the symbol codes for $a,b,\ldots$ used in @joriki's answer; but for the sake of simplicity I will use reverse-lexicographic ordering and zero-based positions and letter codes. Conversion is done by replacing @joriki's $(i_1,\ldots,i_k)$ with $(n-i_k,\ldots,n-i_1)$ and replacing the position resulting from my formula with its distance to $\binom{n}{k}$. The result below is thus consistent with @joriki's formula.

I have used such computations for compression of multi-indices into (skew-)symmetric tensors; therefore I borrow some vocabulary from that domain.

Let us define a $k$-index to be a $k$-tuple of strictly increasing nonnegative integers. You may have to sort accordingly and to disallow duplicate entries.

$k$-indices can be totally ordered in a reverse-lexicographic manner: Sorting is done by the last element, in case of equality by the next-to-last element, and so on.

For a $k$-index $I$, let us define its position (or compressed index) $\operatorname{ordx}(I)$ as the number of $k$-indices that are reverse-lexicographically smaller than $I$.

Note that $\operatorname{ordx}(I) = 0$ for the smallest $k$-index $I=(0,\ldots,k-1)$.

We need a notation for truncated tuples. Denoting $I = (i_0,\ldots,i_{k-1})$, let $I_m = (i_0,\ldots,i_{k-m-1})$ for integer $m$ with $0\leq m<k$. That is $I$ with the last $m$ elements chopped off.

Now a $k$-index $J=(j_0,\ldots,j_{k-1})$ is reverse-lexicographically smaller than $I$ if and only if

$j_{k-1} < i_{k-1}$; there are $\binom{i_{k-1}}{k}$ such $k$-indices; or
$k>1$, and $j_{k-1} = i_{k-1}$, and $J_1$ is reverse-lexicographically smaller than $I_1$. This condition involves a comparison of $(k-1)$-indices.

Proceeding by induction, we arrive at the formula

$$\operatorname{ordx}(I) = \sum_{r=1}^k\binom{i_{r-1}}{r}$$

It is worth noting that this formula does not depend on the upper bound $n$ for the index elements.

The binomial coefficient values can be computed on the fly by initializing and updating a segment of Pascal's triangle. Define $$b_{j}^{(r)} = \binom{j+r}{r} = \begin{cases} 1 & \text{if $r=0$ or $j=0$} \\ b_{j}^{(r-1)} + b_{j-1}^{(r)} & \text{if $r>0$ and $j>0$} \end{cases}$$ So we just need to initialize and update an array $$B^{(r)} = \left(b_0^{(r)},\ldots,b_{i_{k-1}-k}^{(r)}\right)$$ In practice, we prepend a $0$ to that array in order to account for the case $i_{r-1} = r-1$ which requires a zero binomial coefficient.

In Python (which uses zero-based indices and half-open ranges):

def ordx(idx):
    """
    Turns a multi-index of strictly increasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    b = [0] + [1] * (idx[-1] + 1 - len(idx))    # [0, 1, 1, ...]
    for r,i in enumerate(idx):      # (0,idx[0]), (1,idx[1]), ...
        for j in xrange(2, len(b)): # 2, ..., len(b)-1
            b[j] += b[j-1]          # binomial(j+r, r+1)
        s += b[i - r]
    return s

Besides: If you want to allow duplicate tuple elements, you can transform that problem by adding to each element $i_r$ the sub-index $r$ and computing ordx for the modified tuple which now has strict increments. For that use case, the code above gets simplified a bit:

def ords(idx):
    """
    Turns a multi-index of nondecreasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    b = [0] + [1] * idx[-1]         # [0, 1, 1, ..., 1]
    for i in idx:
        for j in xrange(2, len(b)): # 2, ..., len(b)-1 = idx[-1]
            b[j] += b[j-1]
        s += b[i]
    return s

This computes $$\operatorname{ords}(I) = \sum_{r=0}^{k-1}\binom{i_r + r}{r+1}$$ Such a function ords could be used for compressing a sorted multi-index for totally symmetric tensors to a flat index that removes redundancy.

Update: The above algorithms are simple, but need to update an array of binomial coefficients for each index element. Consequently, running the above ords needs $k\,i_{k-1}$ additions and an array b of length $i_{k-1}+1$. For ordx replace $i_{k-1}$ with $i_{k-1}-k+1$. We can reduce the operation count and memory usage by computing each needed binomial coefficient directly from the previous one. This requires a sequence of multiplications and divisions instead of just additions, but it reduces binomial bookkeeping to one scalar variable and keeps total ords operation count at $\operatorname{O}(k+i_{k-1})$. Correspondingly, ordx operation count is $\operatorname{O}(i_{k-1})$. Here is a sample Python implementation of the optimized $\operatorname{ordx}$ (with // denoting integer division):

def ordx_opt(idx):
    """
    Turns a multi-index of strictly increasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    j = 1
    b = 1
    for r,i in enumerate(idx):  # (0,idx[0]), (1,idx[1]), ...
        if i == r: continue     # skipping terms with zero binomial coeff
        # b == binomial(j+r-1, r), update to j == i - r
        while j < i - r:
            b *= j + r
            b //= j
            j += 1
        # b == binomial(i-1, r), update to binomial(i, r+1)
        b *= i
        b //= r + 1
        s += b
    return s

And the corresponding optimized $\operatorname{ords}$:

def ords_opt(idx):
    """
    Turns a multi-index of nondecreasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    j = 1
    b = 1
    for r,i in enumerate(idx):
        if i == 0: continue     # skipping terms with zero binomial coeff
        # b == binomial(j+r-1, r), update to j == i
        while j < i:
            b *= j + r
            b //= j
            j += 1
        # b == binomial(i+r-1, r), update to binomial(i+r, r+1)
        b *= i + r
        b //= r + 1
        s += b
    return s

I posted the same question and was pointed here. I was able to use joriki's solution, slightly tweaked, and came up with the following C# code:

public int GetCombID(List<int> comb, int max = 51)
{
    UInt64 id = nCr(max + 1, comb.Count);            
    for (int i = 0; i < comb.Count; i++)
        id -= nCr(max - comb[i], comb.Count - i);
    return (int)id;
}

To reverse the process I use the following:

public List<int> GetCombFromID(int id, int combLength = 2,  int max = 51)
{
    List<int> comb = new List<int>(combLength);
    var tId = nCr(max + 1, combLength) - (UInt64)id;
    for(int i = combLength; i > 0; i--)
    {
        UInt64 tVal = 0;
        bool done = false;
        int pos = 0;
        while(!done)
        {
            var t = nCr(max - pos, i);
            if (t <= tId)
            {
                tVal = t;
                done = true;
            }
            pos++;
        }
        tId -= tVal;
        comb.Add(pos - 1);
    }

    return comb;
}

I post this now as a reference for others in the future.

Fast way to get a position of combination (without repetitions)

Related

Recent Posts