Fast computing of log2 for 64-bit integers

Solution 1:

Intrinsic functions are really fast, but still are insufficient for a truly cross-platform, compiler-independent implementation of log2. So in case anyone is interested, here is the fastest, branch-free, CPU-abstract DeBruijn-like algorithm I've come to while researching the topic on my own.

const int tab64[64] = {
    63,  0, 58,  1, 59, 47, 53,  2,
    60, 39, 48, 27, 54, 33, 42,  3,
    61, 51, 37, 40, 49, 18, 28, 20,
    55, 30, 34, 11, 43, 14, 22,  4,
    62, 57, 46, 52, 38, 26, 32, 41,
    50, 36, 17, 19, 29, 10, 13, 21,
    56, 45, 25, 31, 35, 16,  9, 12,
    44, 24, 15,  8, 23,  7,  6,  5};

int log2_64 (uint64_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    value |= value >> 32;
    return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58];
}

The part of rounding down to the next lower power of 2 was taken from Power-of-2 Boundaries and the part of getting the number of trailing zeros was taken from BitScan (the (bb & -bb) code there is to single out the rightmost bit that is set to 1, which is not needed after we've rounded the value down to the next power of 2).

And the 32-bit implementation, by the way, is

const int tab32[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31};

int log2_32 (uint32_t value)
{
    value |= value >> 1;
    value |= value >> 2;
    value |= value >> 4;
    value |= value >> 8;
    value |= value >> 16;
    return tab32[(uint32_t)(value*0x07C4ACDD) >> 27];
}

As with any other computational method, log2 requires the input value to be greater than zero.

Solution 2:

If you are using GCC, a lookup table is unnecessary in this case.

GCC provides a builtin function to determine the amount of leading zeros:

Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

So you can define:

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

and it will work for any unsigned long long int. The result is rounded down.

For x86 and AMD64 GCC will compile it to a bsr instruction, so the solution is very fast (much faster than lookup tables).

Working example:

#include <stdio.h>

#define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1))

int main(void) {
    unsigned long long input;
    while (scanf("%llu", &input) == 1) {
        printf("log(%llu) = %u\n", input, LOG2(input));
    }
    return 0;
}

Compiled output: https://godbolt.org/z/16GnjszMs

Solution 3:

I was trying to convert Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup to 64-bit by brute forcing the magic number. Needless to say it was taking a while.

I then found Desmond's answer and decided to try his magic number as a start point. Since I have a 6 core processor I ran it in parallel starting at 0x07EDD5E59A4E28C2 / 6 multiples. I was surprised it found something immediately. Turns out 0x07EDD5E59A4E28C2 / 2 worked.

So here is the code for 0x07EDD5E59A4E28C2 which saves you a shift and subtract:

int LogBase2(uint64_t n)
{
    static const int table[64] = {
        0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61,
        51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62,
        57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56,
        45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 };

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;
    n |= n >> 32;

    return table[(n * 0x03f6eaf2cd271461) >> 58];
}

Solution 4:

Base-2 Integer Logarithm

Here is what I do for 64-bit unsigned integers. This calculates the floor of the base-2 logarithm, which is equivalent to the index of the most significant bit. This method is smokingly fast for large numbers because it uses an unrolled loop that executes always in log₂64 = 6 steps.

Essentially, what it does is subtracts away progressively smaller squares in the sequence { 0 ≤ k ≤ 5: 2^(2^k) } = { 2³², 2¹⁶, 2⁸, 2⁴, 2², 2¹ } = { 4294967296, 65536, 256, 16, 4, 2, 1 } and sums the exponents k of the subtracted values.

int uint64_log2(uint64_t n)
{
  #define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; }

  int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i;

  #undef S
}

Note that this returns –1 if given the invalid input of 0 (which is what the initial -(n == 0) is checking for). If you never expect to invoke it with n == 0, you could substitute int i = 0; for the initializer and add assert(n != 0); at entry to the function.

Base-10 Integer Logarithm

Base-10 integer logarithms can be calculated using similarly — with the largest square to test being 10¹⁶ because log₁₀2⁶⁴ ≅ 19.2659... (Note: This is not the fastest way to accomplish a base-10 integer logarithm, because it uses integer division, which is inherently slow. A faster implementation would be to use an accumulator with values that grow exponentially, and compare against the accumulator, in effect doing a sort of binary search.)

int uint64_log10(uint64_t n)
{
  #define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); }

  int i = -(n == 0);
  S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10);
  return i;

  #undef S
}