Naturality of tensors in Differential Geometry
Solution 1:
While this is seldom emphasized, certain tensors already appear quite naturally in the context of multi-variable calculus on $\mathbb{R}^n$. If one wants to treat all the higher-order derivatives of a function $f$ in a unified, basis-independent and consistent way, one is lead naturally to the notion of (symmetric) tensors of various orders. From this perspective, it is natural to discuss tensors of arbitrary order together and bundle them up because higher order tensors appear naturally as the derivatives of lower order tensors. For example, if you care about the second derivative (also known as the Hessian) of a scalar function (a $(0,0)$-tensor), you should care about $(0,2)$-tensors.
Let me demonstrate how this works:
Let $f = (f_1, \dots, f_m) \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a smooth map (in the sense that all possible partial derivatives of the $f_i$ of all orders exist). Then:
- The first derivative (or differential) of $f$ at a point $p \in \mathbb{R}^n$ is defined as the unique linear map $(Df)(p) = Df|_p \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ which satisfies $$ \lim_{h \to 0} \frac{f(p + h) - f(p) - (Df|_p)(h)}{\| h \|_{\mathbb{R}^n}} = 0. $$ When $m = 1$, the scalar $Df|_p(h)$ gives us the directional derivative $\frac{d}{dt} f(p + th)|_{t = 0}$ of $f$ at a point $p$ in the direction $h$. In general, the linear map $Df|_p$ can be represented with respect to the standard bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ as a $m \times n$ matrix $$ \begin{pmatrix} \frac{\partial f_1}{\partial x^1} & \dots & \frac{\partial f_1}{\partial x^n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x^1} & \dots & \frac{\partial f_m}{\partial x^n} \end{pmatrix} $$ but when we move to the context of manifolds, we won't have a notion of "standard bases" so it is best to think of the first derivative $Df_p$ as a linear map and not as a matrix. Finally, a linear map $\mathbb{R}^n \rightarrow \mathbb{R}^m$ can be identified naturally with an element of the tensor product $\left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$ and so the total first derivative $p \mapsto Df(p)$ is a smooth map from $\mathbb{R}^n$ to $\left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$.
- The second derivative $(D^2f)(p) = D^2f|_p$ of $f$ at a point $p$ is the first derivative of the map $p \mapsto Df(p)$ at $p$. The map $p \mapsto Df(p)$ is a smooth map from $\mathbb{R}^n$ to $\operatorname{Hom}(\mathbb{R}^n, \mathbb{R}^m)$ (which we can identify by using bases with $M_{m \times n}(\mathbb{R}) \cong \mathbb{R}^{m \times n}$) so $(D^2f)(p)$ is a linear map with signature $$(D^2f)(p) \colon \mathbb{R}^n \rightarrow \operatorname{Hom}(\mathbb{R}^n, \mathbb{R}^m). $$ Such maps are naturally identified with bilinear maps $\mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^m$ which again, can be identified as elements of $\left( \mathbb{R}^n \right)^{*} \otimes \left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$.
More generally, the $k$-th derivative of $f$ turns out to be a smooth map from $\mathbb{R}^n$ to the space $$\underbrace{\left( \mathbb{R}^n \right)^{*} \otimes \dots \otimes \left( \mathbb{R}^n \right)^{*}}_{k\text{ times}} \otimes \mathbb{R}^m $$ which, in your notation, would be a $(0,k)$ ($\mathbb{R}^m$-valued) tensor field on $\mathbb{R}^n$. If $m = 1$, this is just a $(0,k)$ tensor field. If $n = m$ (and then we can think of $f$ as a vector field on $\mathbb{R}^n$), the $k$-th derivative of $f$ is a $(1,k)$-tensor on $\mathbb{R}^n$.
Solution 2:
I think that what you're missing is the very one thing that historically brought to identify tensors in two different types: their covariant and contravariant nature.
It might be that you're thinking at covariant and contravariant nature of tensors as exclusively an algebric propriety, but in fact I think is deeply geometric.
Let's see if I can explain what I mean. When you do a transformation to a space such as a magnification the structures you have on the geometrical object can react in 2 different ways: they can transform accordingly or covariantly, saying that in a certain sense are deeply linked to the geometrical object you are transforming; or otherwise they can be appear as indipendent and then said to transform contravariantly because when you apply a transformation their being immune to it makes them appear to transform the opposite way.
So let's do an example to illustrate what I mean. Let's say you have a point $P$ in 3D space. To treat this space algebrically you choose an origin and identify this 3D space with the vector space $R^3$ with canonical base ${e_i}$. Now the point $P$ corresponds to vector $v$ and has coordinates let's say (4,4,4). Then you decide to operate a transformation on your vectorial space enlarging by a factor 4. All vectors of the space are enlarged and so they are transformed covariantly, while $P$ -which now appears to have coordinates (1,1,1) - it has been transformed contravariantly. The deep reason for this different behaviour is that vectors were inside the vector space that transformed, while the point was something indipendent from the geometrical structure that was transformed.