What are the higher derivatives of a multivariate function?

I have recently realized that I am not sure what it means to consider $F''(x)$ and general higher derivatives for $F: \mathbb R ^n \rightarrow \mathbb R^m$. I am clear the at the first derivative is a matrix of partial derivatives $ \left( \begin{array}{ccc} \frac{\partial f_1}{\partial x_1}(x) & ... & \frac{\partial f_1}{\partial x_n}(x) \\ ... & ... & ... \\ \frac{\partial f_m}{\partial x_1}(x) & ... & \frac{\partial f_m}{\partial x_n}(x) \end{array} \right) $. But I am not sure how to differentiate this matrix. Do I want to treat it as a vector and then get the first derivative of the function $G: \mathbb R^n \rightarrow \mathbb R^{n \times m}$ which sends $x$ to the vector of partial derivatives?


To understand it, you have to treat derivatives as linear operators. If $f:\mathbb{R}^n\to\mathbb{R}^m$ then $$f':\mathbb{R}^n\to L(\mathbb{R}^n,\mathbb{R}^m)$$

where $ L(\mathbb{R}^n,\mathbb{R}^m)$ is the set of linear transformations from $\mathbb{R}^n$ to $\mathbb{R}^m$. It can be identified with $M_{n\times m}$ or $\mathbb{R}^n\times \mathbb{R}^m$. If you identify it with $\mathbb{R}^{n+m}$ you see that differentiate the matrix is the same as differentiate the function $f':\mathbb{R}^n\to\mathbb{R}^{n+m}$ and the last you know how to differentiate. Moreover, because $f'(x)$ is a linear transformation for all $x$, you have to understand how this transformation works and it works according with the formula $$f'(x)u=A_x u$$

where $A_x$ is the matrix of derivatives which is in your question.

To proceed we have that $$f'':\mathbb{R}^n\to L(\mathbb{R}^n, L(\mathbb{R}^n,\mathbb{R}^m))$$

Now $f''$ is a function which send $x$ to a linear transformation $f''(x)$ from $\mathbb{R}^n$ to $ L(\mathbb{R}^n,\mathbb{R}^m)$. But such linear operator can be identified with a bilinear form $g(x)$ by considering $g(x):\mathbb{R}^n\times \mathbb{R}^n\to\mathbb{R}^m$ defined by $$g(x)uv=f''(x)uv$$

Moreover, note that $$f''(x)uv=[f'(x)u]'v$$

hence $f''(x)$ is the bilinear form defined by $$f''(x)uv=[A_xu]'v$$

where $$[A_xu]'=\left( \begin{array}{ccc} \frac{\partial \sum_{i=1}^n\frac{\partial f_1}{\partial x_i}u_i}{\partial x_1} & ... & \frac{\partial \sum_{i=1}^n\frac{\partial f_1}{\partial x_i}u_i}{\partial x_n} \\ ... & ... & ... \\ \frac{\partial \sum_{i=1}^n\frac{\partial f_m}{\partial x_i}u_i}{\partial x_1} & ... &\frac{\partial \sum_{i=1}^n\frac{\partial f_m}{\partial x_i}u_i}{\partial x_n} \end{array} \right)$$

For example, consider the function $f(x,y)=(x^2-y,x+y^2)$. We can identify $$f'(x,y)= \left( \begin{array}{cc} 2x & -1 \\ 1 & 2y \end{array} \right) = (2x,1,-1,2y)$$

and we know that $$f'(x,y)(u,v)=\left( \begin{array}{cc} 2x & -1 \\ 1 & 2y \end{array} \right)\left( \begin{array}{c} u \\ v \end{array} \right)$$

Now, $f''(x,y)(u,v)(z,w)=[f'(x,y)(u,v)]'(z,w)$, but $$f'(x,y)(u,v)=(2xu-v,u+2yv)$$

Now we think on $(u,v)$ in the last expression as constants and $$f''(x,y)(u,v)=\left( \begin{array}{cc} 2u & 0 \\ 0 & 2v \end{array} \right)$$

which implies that $$f''(x,y)(u,v)(z,w)=(2uz,2vw)$$

The case $f^{(n)}$ is similar.


Treating the matrix of $n$th derivatives as a vector and writing the result of $(n+1)$th derivatives in matrix form indeed one way to accomplish this (one can also use Kronecker products, as I have seen with some authors).

However, one can also just use extra levels of indicies, as expressing the $N$th derivatives of a vector-to-vector function will require the use of a $(N+1)$th-order tensor.

For your example, the rank-3 tensor $G(x) =\nabla^2 f(x)$ will require 3 indices, as $G_{ijk} = \frac{\partial^2}{\partial x_j\partial x_k}f_i(x)$.


Dieudonne said it best: this is the introduction to his chapter on dierentiation in Modern Analysis Chapter VIII.

The subject matter of this Chapter is nothing else but the elementary theorems of Calculus, which however are presented in a way which will probably be new to most students. That presentation, which throughout adheres strictly to our general "geometric" outlook on Analysis, aims at keeping as close as possible to the fundamental idea of Calculus, namely the "local" approximation of functions by linear functions. In the classical teaching of Calculus, the idea is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to- one correspondence between linear forms and numbers, and therefore the derivative at a point is dened as a number instead of a linear form. This slavish subservience to the shibboleth of numerical interpretation at any cost becomes much worse when dealing with functions of several variables...

In other words, the proper way to look at differentiation does not allow us to simply repeated partial differentiate an object. We should think about successive differentials, one after the other to understand higher derivatives in higher dimensions. Sadly I have to go to class now.