Shortcut to finding the square-root of a perfect-square?
I've been trying to speed up an algorithm where the most expensive operation is the square-root. However, I can often guarantee that the input value is a perfect-square. I'm curious to know if there are any algorithms that will find the square-root faster (constant time?) if it is known that the input is a perfect-square?
Thanks, Ryan
The sum of the first $k$ odd numbers is $k^2$. Knowing this, you can you calculate the square root by summing successive odd numbers (starting from one)—once you reach the input value, return the number of summations you made.
For example, $16 = 1 + 3 + 5 + 7$; that's $4$ addends, so $\sqrt{16}=4$. This process will always work, since our input is guaranteed to be of the form $k^2$ with $k \in \mathbb N$.
I think this method would run in $O(\sqrt n)$.
With integers within sensible bounds compared to what your CPU can natively compute, it can be quite easy to restrict the range of numbers you have to binary search to find the square root of x.
(0. remove two-blocks of trailing 0 from your binary number. Each block you remove is one factor of 2 to be multiplied to the result of the following step. This can be done in constant time, if I'm not mistaken: Observe the structure of "Subtract 1 and XOR with the input" for numbers with $t$ trailing 0s. Then use the POPCNT (Hamming weight) instruction of most serious CPUs. After removing these 0s, i.e. dividing by $4^n$, you'll end up with an odd number; if you end up with an even number after removing an even number of 0s, your number is not a perfect square.)
- Find $k=\lfloor\log_2 x\rfloor $, see https://graphics.stanford.edu/~seander/bithacks.html
- $a=\frac k2$
- Thus, $2^a$ becomes a lower limit for $\sqrt x$ and $2^{a+1}$ an upper. Both values can be found via bit-shifting 1.
- From here, do a binary search¹.
I doubt you'd be much faster than converting to floating point and letting the FPU do it in hardware, giving you an approximate value, comvertable back to integer, from which you only need to search small ranges (namely, the lost precision) for the actual integer square root.
Note that in such problems as yours, algorithmic elegance often plays a minor role - it needs to be fast on actual hardware, so execution avoiding a lot of memory interaction is a good thing, and: with SIMD instructions, doing four to 16 operations of the same type take about as long as doing one; so if you just need to test a few integers for their square, modifying your algorithm to be able to try four in parallel is way more efficient than saving half of the operations necessary.
You have a technological problem, not so much a numerical.
¹ binary search assumes that you can do one squaring and one comparison at once; as hinted at before, you might very well be able to divide your interval into five search chunks by calculating four products at once and comparing four numbers at once using SIMD. This further hints that even if there should be no constant time algorithm (and I'm pretty sure there's none), you can be better than $\mathcal O(n^2·\log_2 x)$; compare Fürer's algorithm.
I think the only advantage gained by having a perfect square in analytic methods is that you know an iterative algorithm will actually terminate. So instead here is a number theoretic solution that'll work for numbers less than $2^{66}$.
Fact 1: If $p$ is a prime with $p \equiv 3 \mod 4$ and $x$ is a perfect square $\mod p$, then $$x \equiv \left(x^{(p+1)/4}\right)^2 \mod p,$$ i.e. you can compute the modular square root by exponentiating by $(p+1)/4$. (See https://crypto.stackexchange.com/a/20994/18112)
Fact 2: The numbers $m_{17}=2^{17}-1$, $m_{19}=2^{19}-1$, and $m_{31}=2^{31}-1$ are (Mersenne) primes whose product is greater than $2^{66}$.
Method: Let $S$ be the square whose root $t$ you'd like to find. Compute the following $$t_{17} \equiv S^{2^{15}} \mod m_{17}$$ $$t_{19} \equiv S^{2^{17}} \mod m_{19}$$ $$t_{31} \equiv S^{2^{29}} \mod m_{31}$$ Then the Chinese Remainder Theorem gives $$t \equiv \pm 31207 t_{17} m_{19} m_{31} \pm 298611 m_{17} t_{19} m_{31} \pm 413071270 m_{17} m_{19} t_{31} \mod m_{17}m_{19}m_{31}$$ Then check these 8 possibilities.
Remarks: I don't know how computationally efficient this is; it's more of a mathematical solution taking advantage of knowing that $S$ is a square. I would venture to guess it's about as "constant time" as you could get as the number of steps is essentially fixed, but that constant may be larger than the $\sqrt{n}$ of other methods for this range of $n$.
I think a binary search type algorithm would be quite efficient for large input values if we know the input is a perfect square.
Let $n$ be the input value.
Begin with two integers $a$ and $b$ such that $a^2 < n$ and $b^2 > n$. We could use $a=0$ and $b=n$.
Find the midpoint of $a$ and $b$ and round down to the nearest integer if necessary. Call this $m$.
Find $m^2$. If $m^2=n$ then $m$ is the square root. If $m^2>n$ then $m$ is too high, so we return to step 2 with $a$ and $m$ as our two integers. If $m^2<n$ then $m$ is too low, so we return to step 2 with $m$ and $b$ as our two integers. Repeat until the square root is found.
The squaring of $m$ may be what slows the algorithm down, however I believe that multiplication algorithms are implemented in processor hardware and therefore very efficient. In terms of the number of operations, I believe the binary search would run in logarithmic time and therefore be preferable to $O(\sqrt n)$ for large input values. However, I am no expert on algorithm efficiency...