Modular exponentiation by repeated squaring

I came upon an interesting way to relatively quickly compute modular exponentiation with large numbers. However, I do not fully understand it and was hoping for a better explanation.

The method basically states if you have $x^y \bmod N$, it can be computed by repeatedly multiplying by $x$ modulo $N$:

$$x \bmod N \rightarrow x^2 \bmod N \rightarrow x^4 \bmod N \rightarrow x^8 \bmod N \rightarrow \ldots \rightarrow x^{2[\log y]} \bmod N$$

This is supposed to calculate in $\mathcal{O}(\log_2 N)$ time.

An example is given:

$$x^{25} = x^{11001_2} = x^{10000_2} * x^{1000_2} * x^{1_2} = x^{16} * x^8 * x^1$$

I don't really understand how this is faster. Aren't you just doing the same thing? You're just breaking down the exponents into representations of base $2$. Also where does $x^{2[\log y]}$ come from? If you're breaking it down into exponents of two where does a logarithm come from? (Aren't logs representations of numbers in exponents of $10$?)

I'm sorry if this appears to be a really stupid question - I have minimal knowledge in math. Also, I wasn't sure if this belongs here or on stack exchange, so give me the word and I'll move it.


Solution 1:

Let's count how many modular multiplications we need to do to get to $x^{25}$ with the two methods. The trivial method would need 24 multiplications (I leave out the calculation of the residue modulo $N$ - that will always be there): $x^2=x*x$, $x^3=x^2*x$, $x^4=x^3*x$, $\ldots$, $x^{25}=x^{24}*x$. With repeated squaring there are several approaches. We might first compute $x^2=x*x$, $x^4=(x^2)*(x^2)$, $x^8=(x^4)*(x^4)$, $x^{16}=(x^8)*(x^8)$. That is 4 multiplications so far. The point of doing this to store these values somewhere. Then we can compute $x^{24}=(x^{16})*(x^8)$ and $x^{25}=(x^{24})*x$ with just two extra multiplications. A total of 6 multiplications in comparison to 24. In the worst case we need to do about $2*\log_2m$ multiplications to compute $x^m$.

If you don't have enough memory to store that look table of powers $x^{2^k},k=0,1,\ldots,[\log_2m]$, then you can get away with less by using the approach outlined in Bill Dubuque's answer. In addition to $x$ itself you only need to store one other intermediate value. Here $25=11001_2$, and for a hopefully obvious reason I will also write the exponents in binary in parentheses in what follows. First compute the square $$ x^2=(x^{10_2})=x*x. $$ We notice that our exponent $25=11001_2$ begin with two 1s, so next we multiply this result with $x$ and calculate $$ x^3=(x^{11_2})=(x^2)*x. $$ Notice that this calculation needed only the result of the previous multiplication and the input $x$. Let's square this $$ x^6=(x^{110_2})=(x^3)*(x^3). $$ Again we only needed the previous output $x^3$. Now we have the three leading bits from the exponent correctly in place, or because the third bit (starting with the most significant) of the exponent was 0, there is no need to multiply this by $x$. So we square again and get $$ x^{12}=(x^{1100_2})=(x^6)*(x^6). $$ Again we only needed the previous result ($x^6$). The fourth bit of the exponent was also a zero, so we can proceed with the final squaring $$ x^{24}=(x^{11000_2})=(x^{12})*(x^{12}). $$ The "previous value only" -comment applies again. The final bit of the exponent was a '1', so we need to fix it. The last multiplication is $$ x^{25}=(x^{11001_2})=(x^{24})*x. $$ To summarize: We square repeatedly. If the next bit of the exponent is a '1' we insert an extra multiplication with the original input.

Solution 2:

This is known as Exponentiation by repeated squaring (see also Modular exponentiation)

It deserves to be better known that this arises simply from writing the exponent in binary radix in Horner polynomial form, i.e. $\rm\ d_0 + 2\, (d_1 + 2\, (d_2\ +\:\cdots)).\, $ Below is an example of computing $\ x^{25}\ $ by repeated squaring. Note that the repeated square form arises simply from performing various substitutions into the binary polynomial Horner form, namely the substitutions $\rm\ \color{#c00}{1\to x},\ \ \color{#0a0}{0\to 1},\ \ (n)\:\!2\to (n)^2\ $ into $\,25_{10} = 11001_2\ $ expanded into Horner form, viz.

$\begin{align} &\color{#c00}1\color{c00}1\color{#0a0}0\color{#0a0}0\color{#c00}1_2 = 25,\,\ \text{i.e. the binary radix notation for } 25\\[.2em] \Rightarrow \ &\color{#c00}1)2\!+\!\color{#c00}1)2\!+\!\color{#0a0}0)2\!+\!\color{#0a0}0)2\!+\!\color{#c00}1 \:=\: 25\ \ \text{in Horner polynomial form}\\[.2em] \Rightarrow \ &\color{#c00}x)^{\Large 2}\ \ \ \color{#c00}x)^{\Large 2}\ \ \ \color{#0a0}1 )^{\Large 2}\ \ \ \:\! \color{#0a0}1 )^{\Large 2}\ \ \ \:\! \color{#c00}x\:\!=\:\! x^{\large 25}\ \ \rm via\ \bf repeated\ squaring \end{align}$

Solution 3:

Explaining where the log comes from and how this is faster is a good question and it's the same whether you multiply the usual way or mod(N). But faster than what? The issue is how many multiplications and how many additions are required, with the assumption that multiplications are inherently slower than addition, a lot slower.

The usual way to compute $x^y$ for integer y is simply to multiply x by x, then x by the result, and so on, y times. Obviously this requires $y-1$ multiplications and no additions, 24 in your example.

The method of repeated squaring, on the other hand, uses the fact that any exponent $y$ can be written as the sum of powers of two: 1, 2, 4, 8, 16 and so on, which is the basis of the binary number system. Your breakdown of 25 into 16 + 8 + 1 is a good example, and the way it comes from the binary representation of 25 should be sufficient to convey the process to anyone who understands binary representation.

Then, ask yourself what is the most such powers of two that could be required. Clearly, it's the number of binary places in the representation of y, which is $\log_2(y)$, or more precisely $\lceil \log_2(y) \rceil$, ($\lceil x \rceil$ is the ceil or next higher or equal integer function). So the $\log$ here is base 2, not base 10 (but those only differ by a constant factor, namely $log_{10}(2) \approx .301$), which really tells you how many doublings are needed to go from 1 to $y$. In your example, $log_2(25)$ is about 4.64 ($\log_{10}(25)/.301$), so the most powers of two needed is 5.

Last question is, how many arithmetic operations are needed to carry out this method? To get the 5 powers of 2 in your example by repeated squaring you need only 4 multiplications: $x^2 = x \times x, x^4 = x^2 \times x^2, x^8 = x^4 \times x^4, x^{16}=x^8 \times x^8$). Of course, you also need some additions, but they are computationally "cheap", and I'll leave it to you to figure how many of them might be required.

Finally you need to multiply together the powers -- $x^{16} \times x^8 \times x^1$ in this case, which might involve yet one more set of up to 4 or $\lceil \log_2(y)\rceil-1$ multiplications.

So, the general answer is you will need $2 \times\lfloor \log_2(y)\rfloor$ multiplications rather than $y-1$ the "slow" way, or 8 versus 24 for your example -- a big time savings, even bigger for longer numbers, especially if it in the innermost loop of a program and gets done zillions of times.

Bill Dubuque's answer is really this method written out in a special polynomial form which accomplishes the required multiplications and additions efficiently and elegantly.