Why does the generalised derivative have to be a linear transformation?

I am starting to learn Real Analysis and I have come across the generalised definition of the derivative for higher dimensions. I realise that the derivative being a linear transformation nicely accommodates the one dimensional case where the derivative is just a constant at any point. I also understand it can't be as simple as multiplication by a constant for higher dimensions since you can approach a point along multiple curves in higher dimensions. But where did we hit upon the fact that it has to be linear? Why couldn't it be some other type of function? I would like to get an intuitive explanation.


It does not have to be - we want it to be so.

It is a definition that the derivative is a linear map. So the question is more "Why is the notion of linear approximation so interesting, that it deserves such a central place?". The answer to this is, that linear maps are fairly simple to understand while they still are fairly general.

If you choose simpler approximations, e.g. you only allow maps of the form $x\mapsto \lambda*x$ for some scalar $\lambda$ as "derivatives", many functions would not be "differetiable" anymore.

If you choose more complicated maps, e.g. you allow for maps like $x\mapsto Ax + B|x|$ with a componentwise absolute value (so the "derivative" would be a pair $(A,B)$), you will have some more functions "differentiable" but it is far from clear how this notion will be of any help.

So, linear maps seem to be a perfect balance between simplicity and generality. You see this in action, e.g. if you see Newton's method in higher dimension in action or analyze non-linear systems of differential equations by means of their local linearizations.

(Another aspect: For functions of a complex variable there are two notions of differentiability. You can consider real linearity which gives differentiability in the sense of mappings from two dimensional real space to itself. The other possibility is to consider complex linearity and this leads to holomorphic functions. This gives a lot of extra rigidity and leads to more restrictive but also powerful notion of derivative.)


The following is an exploration of how one would generalize the notion of differentiability starting from the $\epsilon-\delta$ definition.

If $f$ is a real-valued function of a single real variable, recall that $f$ is differentiable at $a$ if there exists $L \in \Bbb R$ such that for all $\epsilon >0$, there exists $\delta > 0$ such that for $|h| < \delta$, one has:

$$|f(a+h) -f(a) - Lh| < \epsilon |h|$$

Now let's suppose that $f$ takes its values in a normed vector space $F$, but is still a function of a single real variable. Essentially, the same definition should work, except that we have to replace $|\cdot|$ by $\|\cdot\|$ and that $L$ has to be an element of $F$ this time.

But if $f: E \to F$, where $E$ and $F$ are normed vector spaces, we run into the problem of properly generalizing the product $Lh$. $h$ is an element of $E$ and $Lh$ should be an element of $F$, so the only way $Lh$ would make sense is if $L$ is a function $E \to F$.

Now why should it be linear? Let alone continuous?

Linearity is because we would like to solve the following problem: what guarantees the uniqueness of $L$? If it isn't unique, the whole definition would be pointless; it wouldn't tell us anything exciting. We could always tailor a specific function $E \to F$ satisfying the definition, say $L: h \mapsto f(a+h) -f(a)$. So, our mission is to find a certain class of functions to include $L$ in so that one may guarantee the uniqueness of such an $L$.

Suppose that $L_1$ and $L_2$ are functions $E \to F$ both satisfying:

$$(\forall \epsilon > 0)(\exists \delta >0)(\forall h \in E, \|h\| < \delta \implies \|f(a+h)-f(a) -L h\| < \epsilon \|h\|) \tag 1$$

In other words, $f(a+h) - f(a) = L_i h + o(h)$. As a result, we get $L_1 h - L_2 h = o(h)$, i.e. $Lh = o(h)$, where $L = L_1 - L_2$. The question becomes:

What conditions should $L$ satisfy so that $Lh = o(h) \implies L=0$?

One obvious choice is: $L$ is linear. For if it is so, let $\epsilon > 0$ be arbitrary. Then, there exists $\delta > 0$ such that $\|h\| < \delta \implies \|Lh\| < \epsilon \|h\|$. For $z\neq 0$, choose $h = \frac{\delta}{2 \|z\|} z$, so that $\|h\| < \delta$ and we get after simplification of $\|Lh\| < \epsilon \|h\|$ that $\|Lz\| < \epsilon \|z\|$. In particular, $L$ is bounded and $\|L\| < \epsilon$ for all $\epsilon >0$, so $\|L\| = 0$ and so $L = 0$.

For $L$ to be linear, it's only reasonable to have $L_1$ and $L_2$ both linear because they were arbitrarily chosen. Thus, we'd expect that the $L$ in $(1)$ be linear. Now we still hadn't solved the problem of why $L$ should be continuous. Continuity is because we'd like to solve the following problem:

Does this definition of differentiability guarantee that differentiability implies continuity?

The first thing we do is use $|\|a\| - \|b\| | \le \|a-b\|$ in $(1)$ to get:

$$\|f(a+h) - f(a)\| < \epsilon \|h\| + \|Lh\|$$

If $L$ is not continuous, we can't control $\|Lh\|$ (can't make it as small as we want), so $f$ might not be continuous at $a$. However, if $L$ is continuous, then:

$$\|f(a+h) - f(a)\| < (\epsilon + \|L\|) \|h\|$$

and this gives $\lim_{h\to 0} f(a+h) = f(a)$, i.e. $f$ is continuous at $a$.

Therefore, we get the following definition.

Let $E$ and $F$ be two normed vector spaces, $f: E \to F$ and $a \in E$. We say that $f$ is differentiable at $a$ if there exists a continuous linear map $L: E \to F$, such that:

$$(\forall \epsilon > 0)(\exists \delta >0)(\forall h \in E, \|h\| < \delta \implies \|f(a+h)-f(a) -L h\| < \epsilon \|h\|)$$

Continuity of $L$ isn't included in the definition of differentiability of functions $\Bbb R^n \to \Bbb R^m$ because in finite dimensional normed vector spaces, we get the continuity of $L$ for free.