Newton's method intuition

In optimisation the Newton step is $-\nabla^2f(x)^{-1}\nabla f(x)$. Could someone offer an intuitive explanation of why the Newton direction is a good search direction? For example I can think of steepest gradient decent as a ball rolling down a hill (in three dimensions), I'm struggling to think of a similar analogy for the Newton direction.


Newton's method in optimization makes a local quadratic approximation of the function based on information from the current point, and then jumps to the minimum of that approximation.

Just imagine fitting a little quadratic surface like a parabola to your surface at the current point, and then going to the minimum of the approximation to find the next point. Finding the direction towards the minimum of the quadratic approximation is what you are doing when you solve, $(f'')^{-1}f'$.

enter image description here

By thinking about this picture, you can also see why Newton's method can converge to a saddle or a maximum in some cases. If the eigenvalues of the Hessian are mixed or negative - in those cases the local quadratic approximation is an upside down paraboloid, or saddle.


In Newton's method we pick $\Delta x$ so that $\nabla f(x_k + \Delta x) \approx 0$. Note that $\nabla f(x_k + \Delta x) \approx \nabla f(x_k) + \nabla^2 f(x_k) \Delta x$. Setting this equal to $0$ and solving for $\Delta x$ gives us the Newton step.

Another viewpoint is to start with \begin{equation*} f(x_k + \Delta x) \approx f(x_k) + \langle \nabla f(x_k), \Delta x \rangle + \frac12 \langle \Delta x, \nabla^2 f(x_k) \Delta x \rangle. \end{equation*} Minimizing the right hand side with respect to $\Delta x$ gives us the Newton step.


Newton's method for finding zeros of a function $f$ is as follows $$ x_{n+1}=x_{n}-\frac{f\left(x_{n}\right)}{f^{\prime}\left(x_{n}\right)}. $$ For nice enough $f$ and initial guess $x_{0}$, we get that $$ f\left(\lim_{n\rightarrow\infty}x_{n}\right)=0. $$ The animation on https://en.wikipedia.org/wiki/Newton's_method is particularly nice and will help you understand what is happening.

Newton's method in optimization is the same procedure applied to $f^{\prime}$, so that $$ x_{n+1}=x_{n}-\frac{f^{\prime}\left(x_{n}\right)}{f^{\prime\prime}\left(x_{n}\right)}. $$ This allows us to find $$ f^{\prime}\left(\lim_{n\rightarrow\infty}x_{n}\right)=0. $$ In particular, $x_{n}\rightarrow x$ is a critical point.