How to do an integer log2() in C++?
In the C++ standard libraries I found only a floating point log method. Now I use log to find the level of an index in a binary tree ( floor(2log(index))
).
Code (C++):
int targetlevel = int(log(index)/log(2));
I am afraid that for some of the edge elements (the elements with value 2^n) log will return n-1.999999999999 instead of n.0. Is this fear correct? How can I modify my statement so that it always will return a correct answer?
If you are on a recent-ish x86 or x86-64 platform (and you probably are), use the bsr
instruction which will return the position of the highest set bit in an unsigned integer. It turns out that this is exactly the same as log2(). Here is a short C or C++ function that invokes bsr
using inline ASM:
#include <stdint.h>
static inline uint32_t log2(const uint32_t x) {
uint32_t y;
asm ( "\tbsr %1, %0\n"
: "=r"(y)
: "r" (x)
);
return y;
}
You can use this method instead:
int targetlevel = 0;
while (index >>= 1) ++targetlevel;
Note: this will modify index. If you need it unchanged, create another temporary int.
The corner case is when index is 0. You probably should check it separately and throw an exception or return an error if index == 0.
If you just want a fast integer log2 operation, the following function mylog2()
will do it without having to worry about floating-point accuracy:
#include <limits.h>
static unsigned int mylog2 (unsigned int val) {
if (val == 0) return UINT_MAX;
if (val == 1) return 0;
unsigned int ret = 0;
while (val > 1) {
val >>= 1;
ret++;
}
return ret;
}
#include <stdio.h>
int main (void) {
for (unsigned int i = 0; i < 20; i++)
printf ("%u -> %u\n", i, mylog2(i));
putchar ('\n');
for (unsigned int i = 0; i < 10; i++)
printf ("%u -> %u\n", i+UINT_MAX-9, mylog2(i+UINT_MAX-9));
return 0;
}
The code above also has a small test harness so you can check the behaviour:
0 -> 4294967295
1 -> 0
2 -> 1
3 -> 1
4 -> 2
5 -> 2
6 -> 2
7 -> 2
8 -> 3
9 -> 3
10 -> 3
11 -> 3
12 -> 3
13 -> 3
14 -> 3
15 -> 3
16 -> 4
17 -> 4
18 -> 4
19 -> 4
4294967286 -> 31
4294967287 -> 31
4294967288 -> 31
4294967289 -> 31
4294967290 -> 31
4294967291 -> 31
4294967292 -> 31
4294967293 -> 31
4294967294 -> 31
4294967295 -> 31
It will return UINT_MAX
for an input value of 0 as an indication of an undefined result, so that's something you should check for (no valid unsigned integer will have a logarithm that high).
By the way, there are some insanely fast hacks to do exactly this (find the highest bit set in a 2's complement number) available from here. I wouldn't suggest using them unless speed is of the essence (I prefer readability myself) but you should be made aware that they exist.
Base-2 Integer Logarithm
Here is what I do for 64-bit unsigned integers. This calculates the floor of the base-2 logarithm, which is equivalent to the index of the most significant bit. This method is smokingly fast for large numbers because it uses an unrolled loop that executes always in log₂64 = 6 steps.
Essentially, what it does is subtracts away progressively smaller squares in the sequence { 0 ≤ k ≤ 5: 2^(2^k) } = { 2³², 2¹⁶, 2⁸, 2⁴, 2², 2¹ } = { 4294967296, 65536, 256, 16, 4, 2, 1 } and sums the exponents k of the subtracted values.
int uint64_log2(uint64_t n)
{
#define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; }
int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i;
#undef S
}
Note that this returns –1 if given the invalid input of 0 (which is what the initial -(n == 0)
is checking for). If you never expect to invoke it with n == 0
, you could substitute int i = 0;
for the initializer and add assert(n != 0);
at entry to the function.
Base-10 Integer Logarithm
Base-10 integer logarithms can be calculated using similarly — with the largest square to test being 10¹⁶ because log₁₀2⁶⁴ ≅ 19.2659...
int uint64_log10(uint64_t n)
{
#define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); }
int i = -(n == 0);
S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10);
return i;
#undef S
}
Note that a good compiler will optimize the integer division operations here into multiplication instructions, since the divisions are always by a constant. (This matters because integer division instructions are still very slow even on the fastest modern CPUs, as compared with multiplication instructions.)