Fast inverse square root trick
I found what appears to be an intriguing method for calculating $$\frac{1}{\sqrt x}$$ extremely fast on this website, with more explanation here.
However, the computer-science lingo and "floating"-stuffs left me in the dark as to how it works. I don't actually know what the algorithm is because I can't read C code, so could someone
- Explain the algorithm in mathematical terms
- Explain why it works
(I got the impression that it's using Newton's method with a clever choice of first guess.)
For the convenience of the readers I (the user String) allowed myself to include the C++ code:
float InvSqrt(float x)
{
float xhalf = 0.5f*x; // store 0.5x since x will be changed
int i = *(int*)&x; // convert to binary integer
i = 0x5f3759df - (i>>1); // the smart trick
x = *(float*)&i; // convert binary back to decimal
x = x*(1.5f-xhalf*x*x); // Newton-Raphson step
return x;
}
Solution 1:
First of all, your description of the general aim of the algorithm is correct! It IS indeed Newton-Raphson's method with a clever initial guess. I will only comment on arriving at the clever guess.
Conversion to binary
Trying to phrase it in mostly mathematical terms, the computer takes a decimal number $x\in[1\cdot 2^{-127},2\cdot 2^{128})$ and calculates a piecewise linear approximation to the base $2$ logarithm by finding $q\in[0,1)$ and an integer $e\in[-127,128]$ such that $$ x=(1+q)\cdot 2^e $$ This expression depends linearly on $q$ and exponentially on $e$ and we have the piecewise linear approximation $$ \log_2(x)\approx e+q=\underbrace{\lfloor \log_2(x)\rfloor}_e+\underbrace{x/2^{\lfloor \log_2(x)\rfloor}-1}_q $$ Here is a diagram of the situation with $\log_2(x)$ as the blue curve and $e+q$ as the red polygon:
To store this information, the computer transforms it into a positive integer $$ i=\lfloor(e+127+q)\cdot 2^{23}\rfloor $$ where the integer part of $(e+127+q)$ is the non-negative $(e+127)\in[0,255]$. Note that the floor function used here, which is essentially truncating the number $(1+q)$ after $23$ binary digits of $q$, leads to a loss of precision of at most $2^{-23}\approx 0.000000119$. This is not much and we will disregard it in the following for a cleaner rendering.
Binary digit shifting
The code $\texttt{(i>>1)}$ takes the integer $\texttt{i}$ and shifts its binary digits to the right, dropping the last digit. So with $\texttt{i=1010}$ or $\texttt{i=1011}$ we have $\texttt{(i>>1)=101}$. In effect we have $$ \texttt{(i>>1)}=\lfloor i/2\rfloor $$ Again the floor function has relatively little effect so we will disregard it.
The "magic" number
You can check with Wolfram Alpha that recognizing $\texttt{0x5f3759df}$ as hexidecimal we have $$ \begin{align} \texttt{0x5f3759df}&=1597463007 \\ &= 190.43243014812469482421875\cdot 2^{23} \end{align} $$ For brevity let us just write $190.43243$. Thus the code line $\texttt{i=0x5f3759df-(i>>1)}$ changes $i$ to the new value $$ \begin{align} i_*&= 190.43243\cdot 2^{23}-i/2 \\ &=[190.43243-(e+127+q)/2]\cdot 2^{23} \\ &=[126.93243-(e+q)/2]\cdot 2^{23} \\ &\approx(127+\log_2(1/\sqrt x))\cdot 2^{23} \end{align} $$ So when the integer $i_*$ above is transformed backwards via the piecewise linear approximation into a decimal number we have $$ \begin{align} x_*&=\left(1+\text{frac}(126.93243-(e+q)/2)\right)\cdot 2^{\lfloor 126.93243-(e+q)/2\rfloor-127}\\ &\approx [1+\text{frac}(126.93243+log_2(1/\sqrt x))]\cdot 2^{\lfloor\log_2(1/\sqrt x)\rfloor} \\ &\approx 2^{\log_2(1/\sqrt x)} \\ &=1/\sqrt x \end{align} $$
Why not 190.5?
This is simply to balance the error function of the expression above to minimize the errors. With 190.5 we would have had $127$ instead of $126.93243$ but due to the linear approximations involved this would have lead to skewed errors so that some numbers $x$ would lead to an almost perfect $x_*=1/\sqrt x$ whereas other numbers $x$ would work relatively bad compared to that. Shifting by $190.5-190.43243=0.06757$ apparantly does the trick of balancing errors better.
In both the article you linked to and in a more recent article by a different author some work has been done deriving better constants than $190.43243$. I have not carefully read the sections about that in either of the articles, though.
I hope this clarifies the general principle of the algorithm without going into too much nor too little detail regarding the aim of your question.