Fastest way to get a positive modulo in C/C++
Often in my inner loops I need to index an array in a "wrap-around" way, so that (for example) if the array size is 100 and my code asks for element -2, it should be given element 98. In many high level languages such as Python, one can do this simply with my_array[index % array_size]
, but for some reason C's integer arithmetic (usually) rounds toward zero instead of consistently rounding down, and consequently its modulo operator returns a negative result when given a negative first argument.
Often I know that index
will not be less than -array_size
, and in these cases I just do my_array[(index + array_size) % array_size]
. However, sometimes this can't be guaranteed, and for those cases I would like to know the fastest way to implement an always-positive modulo function. There are several "clever" ways to do it without branching, such as
inline int positive_modulo(int i, int n) {
return (n + (i % n)) % n;
}
or
inline int positive_modulo(int i, int n) {
return (i % n) + (n * (i < 0));
}
Of course I can profile these to find out which is the fastest on my system, but I can't help worrying that I might have missed a better one, or that what's fast on my machine might be slow on a different one.
So is there a standard way to do this, or some clever trick that I've missed that's likely to be the fastest possible way?
Also, I know it's probably wishful thinking, but if there's a way of doing this that can be auto-vectorised, that would be amazing.
The standard way I learned is
inline int positive_modulo(int i, int n) {
return (i % n + n) % n;
}
This function is essentially your first variant without the abs
(which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".
Edit:
Moving on to your second variant: First of all, it contains a bug, too -- the n < 0
should be i < 0
.
This variant may not look as if it branches, but on a lot of architectures, the i < 0
will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0))
with i < 0? n: 0
, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.
As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.
Modulo a power of two, the following works (assuming twos complement representation):
return i & (n-1);