Performance wise, how fast are Bitwise Operators vs. Normal Modulus?

Solution 1:

Unless you're using an ancient compiler, it can already handle this level of conversion on its own. That is to say, a modern compiler can and will implement i % 2 using a bitwise AND instruction, provided it makes sense to do so on the target CPU (which, in fairness, it usually will).

In other words, don't expect to see any difference in performance between these, at least with a reasonably modern compiler with a reasonably competent optimizer. In this case, "reasonably" has a pretty broad definition too--even quite a few compilers that are decades old can handle this sort of micro-optimization with no difficulty at all.

Solution 2:

TL;DR Write for semantics first, optimize measured hot-spots second.

At the CPU level, integer modulus and divisions are among the slowest operations. But you are not writing at the CPU level, instead you write in C++, which your compiler translates to an Intermediate Representation, which finally is translated into assembly according to the model of CPU for which you are compiling.

In this process, the compiler will apply Peephole Optimizations, among which figure Strength Reduction Optimizations such as (courtesy of Wikipedia):

Original Calculation  Replacement Calculation
y = x / 8             y = x >> 3
y = x * 64            y = x << 6
y = x * 2             y = x << 1
y = x * 15            y = (x << 4) - x

The last example is perhaps the most interesting one. Whilst multiplying or dividing by powers of 2 is easily converted (manually) into bit-shifts operations, the compiler is generally taught to perform even smarter transformations that you would probably think about on your own and who are not as easily recognized (at the very least, I do not personally immediately recognize that (x << 4) - x means x * 15).

Solution 3:

This is obviously CPU dependent, but you can expect that bitwise operations will never take more, and typically take less, CPU cycles to complete. In general, integer / and % are famously slow, as CPU instructions go. That said, with modern CPU pipelines having a specific instruction complete earlier doesn't mean your program necessarily runs faster.

Best practice is to write code that's understandable, maintainable, and expressive of the logic it implements. It's extremely rare that this kind of micro-optimisation makes a tangible difference, so it should only be used if profiling has indicated a critical bottleneck and this is proven to make a significant difference. Moreover, if on some specific platform it did make a significant difference, your compiler optimiser may already be substituting a bitwise operation when it can see that's equivalent (this usually requires that you're /-ing or %-ing by a constant).

For whatever it's worth, on x86 instructions specifically - and when the divisor is a runtime-variable value so can't be trivially optimised into e.g. bit-shifts or bitwise-ANDs, the time taken by / and % operations in CPU cycles can be looked up here. There are too many x86-compatible chips to list here, but as an arbitrary example of recent CPUs - if we take Agner's "Sunny Cove (Ice Lake)" (i.e. 10th gen Intel Core) data, DIV and IDIV instructions have a latency between 12 and 19 cycles, whereas bitwise-AND has 1 cycle. On many older CPUs DIV can be 40-60x worse.