What is the fastest algorithm for finding the natural logarithm of a big number?

Say you need an absolute tolerance of $2^{-m}$ for the answer.

Given a number of the form $x=a \cdot 2^n$, $a \in (1/2,1]$, write $\ln(x)=\ln(a)+n\ln(2)$.

Now compute $\ln(a)$ by taking $m$ terms of the Maclaurin series of $\ln(1+x)$ with $x=a-1$, and compute $\ln(2)$ as $-\ln(1/2)$ by taking $m \lceil \log_2(n) \rceil$ terms of the Maclaurin series of $\ln(1+x)$ with $x=-1/2$.

This way is a little bit fussy in terms of working with decimal numbers vs. binary numbers, but it has the advantage that the $\ln(a)$ term converges at worst like $2^{-m}$ rather than like $(9/10)^m$ like the analogous approach with decimal does. It has the downside that you have to precompute $\ln(2)$ to better accuracy since $n$ will be larger, but that doesn't matter that much because it's not a "live" problem (provided you enforce some cap on the size of the input and the size of its reciprocal).

This is generally not how people implement library functions in programming languages like C. See, for example, e_log.c at http://www.netlib.org/fdlibm/. This begins with an argument reduction similar to the one I suggested above (where the lower bound and the upper bound for $a$ differ by a factor of $2$), then converts the problem to $\ln(1+x)=\ln(1+y)-\ln(1-y)$ where $y=\frac{x}{2+x}$. This conversion leads to some series acceleration, since the series for the difference has only odd powers, and since $|y|<|x|$. (You could proceed with a Taylor series approach from here. If you did, it would use around $m/4$ terms, due to the aforementioned cancellations and the fact that $y$ is in the ballpark of $x/2$, taking into account that argument reduction has already been done.)

They then use a minimax polynomial to approximate $\frac{\ln(1+y)-\ln(1-y)-2y}{y}$. This kind of approach is what I usually see when I check source code for fast implementations of standard library functions. The coefficients of this minimax polynomial were probably relatively expensive to calculate, but again that's not a "live" problem so its speed doesn't really matter all that much.


This is essentially a more in depth discussion on the efficiency and accuracy of various methods.


Reducing the argument so that it is near $1$:

Fundamentally, most of the answers aim for the same goal: reduce the arguments to small values and use the Taylor expansion for $\ln(x)$. Thus far we've seen 3 approaches:

1) Factor out a power of 2 and use $\ln(a\cdot2^n)=\ln(a)+n\ln(2)$.

2) Factor out a power of 10 and use $\ln(a\cdot10^n)=\ln(a)+n\ln(10)$.

3) Reduce by square rooting using $\ln(x)=2\ln(\sqrt x)$.

We can note that square rooting the argument repeatedly reduces the argument much faster than the other methods, which divide the argument by a constant repeatedly, since $\sqrt x\ll x/10<x/2$ for large $x$. Realistically, if your input doesn't have more than, say, 1000 digits, then you only have to square root about 10 times at worst. However, this comes at the cost of having to perform more computations to find the square root itself, which are not easy. On the other hand though, performing the divisions is extremely easy. Due to the nature of how we write/store numbers, all the divisions can be computed at once by simply moving the decimal point. We can then easily truncate to whatever accuracy is desired. With the square rooting, the error is harder to manage, and since the log gets multiplied by 2 every time, the absolute error gets multiplied by 2 as well.

Of course, the choice of writing the argument as a multiple of a power of 2 or a power of 10 depends on whether or not this is being done by a computer or by a human. You will likely prefer to work in base 10.

There is also the additional question of what our range of $a$ should be. Since we want this to be as close to 1 as possible, we can do some algebra and see. For powers of 2, we want $a\in(a_0,2a_0]$ such that $2a_0-1=1-a_0$. Solving this gives $a\in[\frac23,\frac43]$. For powers of 10, we want $a\in[\frac2{11},\frac{20}{11}]$.


Using series expansions:

From here we could use the standard Taylor expansion for the natural logarithm:

$$\ln(a)=\sum_{k=1}^\infty\frac{(-1)^{k+1}}k(a-1)^k=(a-1)-\frac{(a-1)^2}2+\frac{(a-1)^3}3-\frac{(a-1)^4}4+\mathcal O(a-1)^5$$

however this does not converge as fast as one could manage by performing a Taylor expansion closer to $a$. The above is given by using the integral definition of the natural logarithm and Taylor expanding the integrand at $1$:

$$\ln(a)=\int_1^a\frac{\mathrm dt}t=\sum_{k=0}^\infty(-1)^k\int_1^a(t-1)^k~\mathrm dt$$

But we can improve on this by expanding in the middle of $1$ and $a$:

\begin{align}\ln(a)=\int_1^a\frac{\mathrm dt}t&=\sum_{k=0}^\infty(-1)^k\left(\frac2{a+1}\right)^{k+1}\int_1^a\left(t-\frac{a+1}2\right)^k~\mathrm dt\\&=\sum_{k=0}^\infty\frac{(-1)^k}{k+1}\left(\frac{a-1}{a+1}\right)^{k+1}\left(1-(-1)^{k+1}\right)\\&=\sum_{k=0}^\infty\frac2{2k+1}\left(\frac{a-1}{a+1}\right)^{2k+1}\end{align}

For $a$ near $1$, the above has a rough error of $\mathcal O((a-1)/2)^{2n+1}$ when using $n$ terms. An algebraic derivation of the above is provided by Wikipedia but doesn't really show off just how fast this one converges. Since we're twice as close to the farthest bound on the integral, we gain an additional binary digit of accuracy per term. But since half of the terms vanish, this means we can basically compute twice as many digits per term! This is the method mentioned by Ian's answer.

Here is a rough Ruby program computing the logarithm using series.


Using root-finding methods:

While the series methods are really nice and converge decently fast, Wikipedia provides two more methods for even higher precision evaluation. The first is provided by Eric Towers and it involves computation of the logarithm via exponential functions. Of course, since the computation is cheap and the accuracy is not effected so much, I would recommend reducing the argument so that it is once again close to $1$. This will mean that $y$ as defined below will be close to $0$, allowing for faster computation of the exponential. This also means we can simply use $y_0=0$ as our initial guess.

$$y=\ln(x)\Rightarrow x=\exp(y)\Rightarrow x-\exp(y)=0$$

At which we may apply standard root-finding methods, such as Newton's method (doubles accurate digits per step) or Halley's method (triples accurate digits per step).

The calculation of the exponential functions can be performed using the Maclaurin expansion:

$$\operatorname{exmp1}(y)=\exp(y)-1=\sum_{n=1}^\infty\frac{y^n}{n!}=y+\frac{y^2}2+\mathcal O(y^3)$$

Since $y$ is near $0$, there is large floating point error in computing $\exp(y)$, which has a dominant term of $1$, so we use $\operatorname{expm1}(y)$ to circumvent this.

One may also note that since $\Delta y_n\to0$, it is easier to compute $\exp(\Delta y_n)$ than $\exp(y_{n+1})$ directly, and use $\exp(y_{n+1})=\exp(\Delta y_n)\exp(y_n)=\exp(y_n)+\exp(y_n)\operatorname{expm1}(\Delta y_n)$. This reduces the main exponentiation down to the first step, which is trivial since $\exp(0)=1$.

Let $y_0=0$ and $\operatorname{expy}_0=1$.

For Newton's method, let $\displaystyle\Delta y_0=x\operatorname{expy}_0-1$ and:

\begin{cases}\Delta y_n=x\operatorname{expy}_n-1,\\\operatorname{expy}_{n+1}=\operatorname{expy}_n+\operatorname{expy}_n\operatorname{expm1}(-\Delta y_n),\\y_{n+1}=y_n+\Delta y_n\end{cases}

For Halley's method, let $\displaystyle\Delta y_0=2\cdot\frac{x-\operatorname{expy}_0}{x+\operatorname{expy}_0}$ and:

\begin{cases}\displaystyle\Delta y_n=2\cdot\frac{x-\operatorname{expy}_n}{x+\operatorname{expy}_n},\\\operatorname{expy}_{n+1}=\operatorname{expy}_n+\operatorname{expy}_n\operatorname{expm1}(\Delta y_n),\\y_{n+1}=y_n+\Delta y_n\end{cases}

Here is a rough Ruby program computing the logarithm with Newton's method and here is a rough Ruby program computing the logarithm with Halley's method.


Using the agm:

The arithmetic-geometric mean is a powerful tool which can be used here to quickly compute the logarithm as well as $\pi$ and certain integrals. It is defined as:

$$a_0=a,b_0=b\\a_{n+1}=\frac{a_n+b_n}2,b_{n+1}=\sqrt{a_nb_n}\\M(a,b)=\lim_{n\to\infty}a_n=\lim_{n\to\infty}b_n$$

According to Wikipedia, this is so fast and cheap to compute that this can be used to compute the exponential function using logarithms faster than series approximating the exponential function! To achieve $p$ bits of accuracy, take an $m$ so that $s=x2^m$ is greater than $2^{p/2}$. We may then compute the natural logarithm as:

$$\ln(x)=\lim_{m\to\infty}\frac{\pi x2^{m-2}}{2M(x2^{m-2},1)}-m\ln(2)$$

which is essentially a restatement of the convergence rate of $M(1,t)$ as $t\to\infty$. For this method, reducing the argument isn't even necessary. We may in fact apply this directly to large arguments!

There are some drawbacks to this method, however. The computation requires us to compute some square roots on large floats, but this can be handled with a specially defined float classes and the respective functions.

Alternatively, of course, one could simply reduce the argument down to avoid large floats like before.

Here is a rough Ruby program computing the logarithm using the arithmetic-geometric mean.


I don't know what the fastest way is, but here's one reasonably efficient approach:

  • You can divide numbers with Newton-Raphson;
  • Once you know how to do that, you can also take square roots with Newton-Raphson;
  • You can use $\ln x=2\ln\sqrt{x}$ as often as you need to get the logarithm's argument close to $1$ before using the Maclaurin series of $\ln(1+x)$;
  • If you need logarithms in another base, use $\log_ax=\frac{\ln x}{\ln a}$.

Halley's method is iterative and its convergence is cubic. Applied here, we would invert to use the exponential (which happily is its own derivative): $$ y_{n+1} = y_n+2 \cdot \frac{x- \mathrm{e}^{y_n}}{x + \mathrm{e}^{y_n}} \text{.} $$ For instance, with $x = 25551879$ and $y_0 = 2$ (i.e., not close), the iterates (all computed with 15-ish digit intermediates) are $2$, $4.$, $5.99999$, $7.99993$, $9.99946$, $11.996$, $13.9708$, $15.7959$, $16.9122$, $17.056$, $17.0562$. My point is not that 15 digits is enough, but that the method reached the shown precision in only ten steps.

You might ask, how do I get those exponentials? Use the power series. That converges quickly for any argument you are likely to see. As a rule of thumb, start with twice as many terms as your argument, so for $\mathrm{e}^{17.0562}$, start with $34$ terms in this Taylor series. This gives $2.5549{\dots}\times 10^{7}$ with error $2355.61{\dots}$. Then increase the number of terms in the exponentials by (in this case) $34$ as long as your estimate for $y_{n+1}$ still changes within your target precision. When that stops happening, take that as your final $y_{n+1}$ and repeat the process of extending the exponential series until your $y_{n+2}$ stabilizes. Continue until you get two $y$s in a row that agree to your target precision (plus enough extra unchanging bits that at least one of them is a zero so that you know which way to round the last bit in your reported answer).


Well$$ \ln(25551879) = \ln(0.25551879 \times 10^{8}) $$ $$= \ln(0.25551879) + \ln(10^8) $$ $$= 8 \times \ln(10) + \ln(0.25551879) $$

Since $\ln(10)$ is a constant that can be precomputed to a huge number of decimal places we only need a method that converges quickly for values less than $1.0$. I don't know if Taylor is good enough on that restricted range or if there is some other better method.

This addresses the issue you raised about the argument being a large number. As to generating lots of digits, there are a lot of good answers on this previous question.