Why does the sign in Newton's method matter?
Deriving Newtons Method visually as with the help of a right triangle and assuming $x_1$ lies the left of $x_0$ we get $$x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}$$ Using slope over run.
but if we assume $x_1$ lies to the right of $x_0$ we get: $$x_1 = x_0 + \frac{f(x_0)}{f'(x_0)}$$
So I thought it doesn't matter, but solving some problems with
$$x_1 = x_0 + \frac{f(x_0)}{f'(x_0)}$$
I diverge.
Could someone enlighten me where the fault in my thinking lies?
Thank you!
The derivation doesn't assume $x_1$ lies on the right of $x_0$. The approximation $x_{k + 1}$ generated from Newton's method is the root of the tangent line at the previous approximation $x_k$: $$ 0 - f(x_k) = f'(x_k)(x_{k + 1} - x_k) $$ or $$x_{k + 1} = x_k - \frac{f(x_k)}{f'(x_k)}. $$
The fault is that, mathematically, there is no reason to suppose changing the sign will still work. There are three ways (that I am aware of) to think about Newton-Raphson; the first is that it is a special case of the Householder methods, but this is maybe a bit complicated.
The second, and the most relevant to appreciating the sign, is the way I intuitively explained it to myself some time ago: picture a graph of a one dimensional real function. Suppose we are at some non-zero point, of non-zero derivative, and that the function is continuously differentiable. There are four cases: we are greater than zero and the graph is increasing, we are greater than zero and decreasing, we are lesser than zero and increasing, we are lesser than zero and decreasing. In all cases assume a nice function where our extrapolations are reasonably accurate - see the note at the bottom as to why Newton's method is far from perfect. Examine the quotient in the first case: $f(x)\gt0$, and $f'(x)\gt0$, so the quotient is positive. Therefore the negative sign is necessary since we want to decrease, and the graph is increasing where we are, so we take a step backwards. I'll do one more case - you have a look at the rest. Let's look at case 3: lesser than zero and the graph is increasing. $f(x)\lt 0,f'(x)\gt0$, so the quotient is negative. We are below zero, below any roots, but since the graph is increasing it is reasonable to assume a step to the right will bring us closer to any roots. Here the negative sign is still essential; we take a step of direction negative negative = positive, and move along the $x$ axis toward a root.
Unrelated to sign, but note also that the method is good in a different sense: when the derivative is small, we expect that we need larger steps to reach a root - division by the derivative ensures larger steps for small derivatives. When the value of $f$ is small, i.e. we are (hopefully) close to a root, the quotient is also small as we only need a small step to accomplish our goal (in an ideal world!). That's how to remember which way up the quotient is.
The third way is as CheeHan answers; you examine the tangential approximation and get the expression.
Anyway, it fails more often when you use a positive sign because using a positive sign just doesn't make sense! The negative sign is necessary to always step towards any likely zeros. I remember being confused by the negative signs when learning about gradient descent algorithms, in multiple variables - the principle is the same. Just think about your different cases, and where you need to go. The negative sign helps you get there.
N.B. Often in my justification of the method we realised the issues with it merely being first-order. Just because the graph is increasing here does not mean it will keep increasing, and just because we are close to zero does not mean we are close to a root. See Halley's method for a more complicated but more reliable iteration. Of course it is also going to fail when the derivative is zero, which is a problem if we need to iterate around a turning point.
I appreciate your question, because you are considering/retrieving things the way Newton himself was doing. That is to say with a geometric intuition, considering (as here) different cases before establishing a general rule.
Let us consider the two cases :
In the left case, writing the identity of slopes:
$$\dfrac{f(x_0)}{x_0-x_1}=f'(x_0) \ \iff \ x_1 = x_0 - \frac{f(x_0)}{f'(x_0)} \ \text{(usual case)}$$
On the right figure, your guideline must be to keep a reasoning on positive quantities ; as, in this case, the slope $f'(x)$ is negative, you must write the identity between the slopes in this way:
$$\dfrac{f(x_0)}{x_1-x_0}=\color{red}{-f'(x_0)}\ \iff \ x_1 = x_0 - \frac{f(x_0)}{f'(x_0)} \ \text{i.e., the right "Newton" again}$$
(the negative sign in front of $f'(x_0)$ reverses the negative sign of $f'(x_0)$ in order to obtain a positive quantity).
Remark:
I just realized that my explanation is connected to the second explanation in the interesting answer of @FShrike ; in particular, we would need in fact to consider 2 other particular cases dealing with the $f(x_0) < 0$ cases, which is also considered in the answer of FShrike.
As a matter of conclusion, as indicated in the other answers, it's better not to have to cope with particular cases and be confident into the "generality of algebra".