Derivative of a linear transformation.

We define derivatives of functions as linear transformations of $R^n \to R^m$. Now talking about the derivative of such linear transformation , as we know if $x \in R^n$ , then

$A(x+h)-A(x)=A(h)$, because of linearity of $A$, which implies that $A'(x)=A$ where , $A'$ is derivative of $A$ . What does this mean? I am not getting the point I think.


This is a fair question, since it is counterintuitive to the way introductory calculus is taught.

One looks at a typical linear function in calc 1: $f(x)=ax$, $a\neq0$, takes the derivative, $f'(x)=a$, and thinks to themselves, "well clearly the linear function is not equal to the constant function, one has a slope and the other is flat!"

Since we generalized to higher dimensions, it is wiser to pay closer attention to what we call the derivative. Merely looking at the Jacobian masks a deeper insight: the derivative is the best affine approximation to a function at a particular point. That is, $F(x)\approx F(a) + F'(x-a)+o(|x-a|)$, which is a good approximation when $x$ is close to $a$. Notice that $F'$ acts as a "factor" on the tangent vector $(x-a)$.

What if $F$ is already affine? Then $F(x)=Ax+b$. Plug this in the formula above, which has exact equality now, and you get: $Ax+b=Aa+b+F'(x-a)$, which gives $A(x-a)=F'(x-a)$, or if we call $h=x-a$, $Ah=F'h$. Notice what that is telling you: $A$ and $F'$ do the same thing to vectors $h$, hence they're equal. $A$ is also the derivative of $F(x)$.

When, $F$ is linear, $b=0$, and thus $Fh=Ah=F'h$. Makes sense.

What about our calc 1 example? The confusion stems from naming. The linear transformation is not $ax$, but $a$. View it as $fx=ax$. It's a 1x1 matrix with the entry $a$. The derivative (Jacobian), at any point, is also just $a$. Hence, $f'x=ax$ also. Thus the generalized notion of derivative is no longer "the slope function", but a unique linear transformation taking tangent vectors to tangent vectors which best approximates the linear behavior of a function at a particular point. In that light it makes sense that $fx=ax=f'x$ since we're viewing $f$ and $f'$ as "factors" at particular points rather than changing functions. This is why $Df(x)$ (which is just $a$ in our example) is used as notation for derivative at particular point $x$ rather than $f'$.

If you're interested there are notions of higher derivatives that take the derivative of the map that assigns to each point $x$ the matrix $Df(x)$, which differs from taking the derivative of the same matrix $Df(x)$, which is just linear and hence the same. See: http://www.math.pitt.edu/~sph/1540/1540-notes4.pdf


$A'$, where $A$ is seen as a linear /map/, has a derivative $A$, where $A$ is now seen as a (constant) matrix..