In what sense is the derivative the "best" linear approximation?

I am familiar with the definition of the Frechet derivative and it's uniqueness if it exists. I would however like to know, how the derivative is the "best" linear approximation. What does this mean formally? The "best" on the entire domain is surely wrong, so it must mean the "best" on a small neighborhood of the point we are differentiating at, where this neighborhood becomes arbitrarily small? Why does the definition of the derivative formalize precisely this? Thank you in advance.


Say the graph of $L$ is a straight line and at one point $a$ we have $L(a)=f(a)$. And suppose $L$ is the tangent line to the graph of $f$ at $a$. Let $L_1$ be another function passing through $(a,f(a))$ whose graph is a straight line. Then there is some open interval $(a-\varepsilon,a+\varepsilon)$ such that for every $x$ in that interval, the value of $L(x)$ is closer to the value of $f(x)$ than is the value of $L_1(x)$. Now one might then have another line $L_2$ through that point whose slope is closer to that of the tangent line than is that of $L_1$, such that $L_2(x)$ actually comes closer to $f(x)$ than does $L(x)$, for some $x$ in that interval. But now there is a still smaller interval $(a-\varepsilon_2,a+\varepsilon_2)$, within which $L$ beats $L_2$. For every line except the tangent line, one can make the interval small enough so that the tangent line beats the other line within that interval. In general there's no one interval that works no matter how close the rival line gets. Rather, one must make the interval small enough in each case separately.


Michael's answer is wonderful. Here is another interpretation of the idea of "best" linear approximation, one that solely appeals to intuition. First, we talk about simply about the notion of 'approximation.'

Imagine the points $\pi$ and $3.14$ on a number line. One might say that $3.14$ 'approximates' $\pi$, and at a certain scale of the number line, our eye would agree. That is, depending on our level of magnification, the points $\pi$ and $3.14$ will appear very close, and perhaps almost indistinguishable.

Next, suppose we begin to zoom in on $\pi$. What will we see? While $\pi$ remains fixed, $3.14$ will slowly move to become a distinguishable point, and begin to travel further and further from $\pi$ until at some magnification it has traveled off of our 'screen.' Now it doesn't seem like $3.14$ is a good approximation of $\pi$.

No matter how many decimals of $\pi$ we include in our approximation, we will always be able to zoom in far enough so that the approximation has traveled outside of our screen. There is only one value that will remain on our screen no matter how far we zoom in, and that is $\pi$ itself.

Now, for the case of a tangent line, imagine a smooth curve in the plane, and a tangent line at a point on this curve. As we zoom in on our point, the line and the curve appear to become indistinguishable, and in fact, at each successive level of magnification, we would need a more precise measuring instrument to distinguish the line from the curve. For any other line passing through the point, there is one measuring instrument accurate enough to distinguish the line from the curve no matter how far we zoom in.

Although the two notions of approximation described here are different, they still serve to illustrate the usefulness of zooming in when asking for the 'best' approximation.

Here is a video that might be helpful.