What is the fastest way to compute sin and cos together?
I would like to compute both the sine and co-sine of a value together (for example to create a rotation matrix). Of course I could compute them separately one after another like a = cos(x); b = sin(x);
, but I wonder if there is a faster way when needing both values.
Edit: To summarize the answers so far:
Vlad said, that there is the asm command
FSINCOS
computing both of them (in almost the same time as a call toFSIN
alone)Like Chi noticed, this optimization is sometimes already done by the compiler (when using optimization flags).
caf pointed out, that functions
sincos
andsincosf
are probably available and can be called directly by just includingmath.h
tanascius approach of using a look-up table is discussed controversial. (However on my computer and in a benchmark scenario it runs 3x faster than
sincos
with almost the same accuracy for 32-bit floating points.)Joel Goodwin linked to an interesting approach of an extremly fast approximation technique with quite good accuray (for me, this is even faster then the table look-up)
Modern Intel/AMD processors have instruction FSINCOS
for calculating sine and cosine functions simultaneously. If you need strong optimization, perhaps you should use it.
Here is a small example: http://home.broadpark.no/~alein/fsincos.html
Here is another example (for MSVC): http://www.codeguru.com/forum/showthread.php?t=328669
Here is yet another example (with gcc): http://www.allegro.cc/forums/thread/588470
Hope one of them helps. (I didn't use this instruction myself, sorry.)
As they are supported on processor level, I expect them to be way much faster than table lookups.
Edit:
Wikipedia suggests that FSINCOS
was added at 387 processors, so you can hardly find a processor which doesn't support it.
Edit:
Intel's documentation states that FSINCOS
is just about 5 times slower than FDIV
(i.e., floating point division).
Edit:
Please note that not all modern compilers optimize calculation of sine and cosine into a call to FSINCOS
. In particular, my VS 2008 didn't do it that way.
Edit:
The first example link is dead, but there is still a version at the Wayback Machine.
Modern x86 processors have a fsincos instruction which will do exactly what you're asking - calculate sin and cos at the same time. A good optimizing compiler should detect code which calculates sin and cos for the same value and use the fsincos command to execute this.
It took some twiddling of compiler flags for this to work, but:
$ gcc --version
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5488)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat main.c
#include <math.h>
struct Sin_cos {double sin; double cos;};
struct Sin_cos fsincos(double val) {
struct Sin_cos r;
r.sin = sin(val);
r.cos = cos(val);
return r;
}
$ gcc -c -S -O3 -ffast-math -mfpmath=387 main.c -o main.s
$ cat main.s
.text
.align 4,0x90
.globl _fsincos
_fsincos:
pushl %ebp
movl %esp, %ebp
fldl 12(%ebp)
fsincos
movl 8(%ebp), %eax
fstpl 8(%eax)
fstpl (%eax)
leave
ret $4
.subsections_via_symbols
Tada, it uses the fsincos instruction!
When you need performance, you could use a precalculated sin/cos table (one table will do, stored as a Dictionary). Well, it depends on the accuracy you need (maybe the table would be to big), but it should be really fast.
Technically, you’d achieve this by using complex numbers and Euler’s Formula. Thus, something like (C++)
complex<double> res = exp(complex<double>(0, x));
// or equivalent
complex<double> res = polar<double>(1, x);
double sin_x = res.imag();
double cos_x = res.real();
should give you sine and cosine in one step. How this is done internally is a question of the compiler and library being used. It could (and might) well take longer to do it this way (just because Euler’s Formula is mostly used to compute the complex exp
using sin
and cos
– and not the other way round) but there might be some theoretical optimisation possible.
Edit
The headers in <complex>
for GNU C++ 4.2 are using explicit calculations of sin
and cos
inside polar
, so it doesn’t look too good for optimisations there unless the compiler does some magic (see the -ffast-math
and -mfpmath
switches as written in Chi’s answer).
You could compute either and then use the identity:
cos(x)2 = 1 - sin(x)2
but as @tanascius says, a precomputed table is the way to go.