Properties and notation of third-order (and higher) partial-derivatives

This question has been bothering me for quite a while and I still haven't found a satisfying answer anywhere on the internet or in any of my books (which may not be that advanced, mind you...). Since I couldn't find a similar question here on MSE, it probably is rather obvious if you have a sufficiently advanced level, which I do not (I'm still undergraduate and my knowledge of tensors is really small!). Out of simplicity, I'll take a specific example for my question. Here it is:

Let $f : \mathbb{R}^2 \rightarrow \mathbb{R}$ ; $(x,y) \rightarrow f(x,y)$ be a continous function with two variables $x$ and $y$.

The "equivalent" of the first derivative known from single-variable functions would be the gradient:

$\nabla f = \left(\begin{array}{cccc} f_x \\ f_y \end{array}\right)$

The second derivative is the Hessian-Matrix: $H_f(x,y) = \left(\begin{array}{cccc} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{array}\right)$

So what does the third-order derivative of this function (or in general...) look like? I've been able to work out that it is a tensor with order $3$. How does one notate such a geometric object?

Concerning the properties: the gradient is a vector field depicting the "slope" of the function in any point using a vector (i.e.: an orientation and a magnitude). The Hessian-Matrix shows us the rate of change of the gradient (correct use of words?). So by this analogy, the third derivative measures the rate of change of the second derivative. But how?

Feel free to change the tags and edit my answer, if you think it necessary. I'm looking forward to any answers!

Thanks in advance, SDV


I have written a somewhat eccentric course on this, which can be found here:

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/

A short answer to your question, though, is that if $f:\mathbb{R}^n \to \mathbb{R}$, then $D^{k+1} f$ is a locally a $(k+1)$-linear function with the property that

$$ D^{k}f\big|_{p+v_{k+1}}(v_1,v_2,...,v_k) \approx D^{k}f\big|_{p}(v_1,v_2,...,v_k) +D^{k+1}\big|_{p}(v_1,v_2,...,v_k,v_{k+1}) $$

In other words, the $(k+1)$ derivative measures changes in the $k$ derivative.

To write things out in a basis, in Einstein notation, we have

$$D^k f = f_{i_1i_2...i_k} dx^{i_1} \otimes dx^{i_2} \otimes ... \otimes dx^{i_k}$$

where $f_{i_1i_2...i_k}$ is the higher partial derivative of $f$ with respect to $x_{i_1}$ then $x_{i_2}$, etc.

I should note that the multivariable Taylor's theorem becomes especially easy to write down using this formalism:

$$ f(x+h) = f(x)+Df\big|_x (h)+\frac{1}{2!} D^2 f\big|_x (h,h)+\frac{1}{3!} D^3 f\big|_x (h,h,h)+... $$

This may also illuminate the presence of $\frac{1}{k!}$ in Taylor's theorem: it arises from the $k!$ permutations of the arguments.


The gradient is not the real equivalent, even though many texts treat it as if it is.

The derivative of a function $f:\mathbb{R}^n\to\mathbb{R}$ at a point $p\in\mathbb{R}^n$ is a $1$-tensor, which means that it eats a vector and spits out a number. More specifically, the thing is obtaining a function $I\to\mathbb{R}$, where $I$ is an open segment in $\mathbb{R}$, and then differentiate it. So given a point $p\in\mathbb{R}^n$ and a "direction" $v\in\mathbb{R}^n$ we define $g_{p,v}:(-\epsilon,\epsilon)\to\mathbb{R}$ by $t\mapsto f(p+tv)$, and then: $$df_p(v)=g_{p,v}'(0).$$ So as claimed above, the $1$-tensor $df_p$ eats a vector ($v$, the direction), and returns a number (the corresponding directional derivative). Since $df_p$ is a linear transfomation $\mathbb{R}^n\to\mathbb{R}$, the appropriate matrix to represent it is a row vector (rather than a column vector).

The second derivative is a $2$-tensor, i.e. it eats two vectors. The procedure is as follows: given a point $p\in\mathbb{R}^n$ and two directions $u,v\in\mathbb{R}^n$ we first define $h_{p,u,v}:(-\epsilon,\epsilon)\to\mathbb{R}$ by $t\mapsto df_{p+tu}(v)$, and then (using $d^2f_p$ for the second derivative tensor) $$d^2f_p(u,v)=h_{p,u,v}'(0).$$ Since $d^2f_p$ is nothing but a bilinear form, it can be represented by a $n\times n$ matrix.

The higher derivatives are defined analogously, and yes, the $m$th derivative is a $m$-tensor, which means one gives it $m$ directions and it returns a number. It can be represented (if we insist) by an $m$ dimensional matrix $n\times n\times\ldots\times n$.