Is multiplication faster than float division? [duplicate]
Solution 1:
Multiplication is faster than division. At university I was taught that division takes six times that of multiplication. The actual timings are architecture dependent but in general multiplication will never be slower or even as slow as division. Always optimize your code towards using multiplication if the rounding errors allow.
So in an example this would typically be slower ...
for (int i=0; i<arraySize; i++) {
a[i] = b[i] / x;
}
... than this ...
y=1/x;
for (int i=0; i<arraySize; i++) {
a[i] = b[i] * y;
}
Of course with rounding errors, you'll loose (a little) precision with the second method, but unless you are repeatedly calculating x=1/x;
that's unlikely to cause much issue.
Edit:
Just for reference. I've dug up a third party comparison of operation timings by searching on Google.
http://gmplib.org/~tege/x86-timing.pdf
Look at the numbers on MUL and DIV. This indicates differences of between 5 and 10 times depending on the processor.
Solution 2:
It is quite likely that the compiler will convert a divide to a multiply in this case, if it "thinks" it's faster. Dividing by 2 in floating point may also be faster than other float divides. If the compiler doesn't convert it, it MAY be faster to use multiply, but not certain - depends on the processor itself.
The gain from manually using multiply instead of divide can be quite large in cases where the compiler can't determine that it's "safe" to do so (e.g. 0.1 can't be stored exactly as 0.1 in a floating point number, it becomes 0.10000000149011612). See below for figures on AMD processors which can be taken as representative for the class.
To tell if your compiler does this well or not, why don't you write a bit of code to experiment. Make sure you write it so that the compiler doesn't just calculate a constant value and discards all the calculation in the loop tho'.
Edit:
AMD's optimisation guide for Family 15h processors, provide figures for fdiv
and fmul
are 42 and 6 respectively. SSE versions are a little closer, 24 (single) or 27 (double) cycles for DIVPS, DIVPD DIVSS and DIVSD (divide), and 6 cycles for all forms of multiply.
From memory, Intel's figures aren't that far off.
Solution 3:
Floating point multiplication usually takes fewer cycles than floating point division. But with literal operands the optimizer is well aware of this kind of micro-optimizations.