Second derivative: how should one think about?

Solution 1:

Suppose $f : \mathbb R^m \to \mathbb R^k$ is differentiable everywhere, in which case the map $f' : \mathbb R^m \to \mathbb R^{k\times m}$ exists and is defined everywhere. This is a matrix-valued function for which $f(x+h) \approx f(x) + f'(x)h$ for vectors $x$, $h \in \mathbb R^m$ with $\|h\| \approx 0$.

There is a norm we can assign to matrices, usually defined like

$$\|A\| = \sup(\|Ax\| : \|x\| \leq 1)$$

in which case $\mathbb R^{k\times m}$ is a normed linear space, and we can apply the definition of differentiability to $f'$. The result is a function $f'' : \mathbb R^m \to \mathcal L(\mathbb R^m, \mathbb R^{k\times m})$ where $\mathcal L(V,W)$ is the collection of (bounded) linear transforms from $V$ to $W$. The function $f''$ has the property that

$$f'(x+h) \approx f'(x) + f''(x)(h)$$

for vectors $x$, $h \in \mathbb R^m$ with $\|h\| \approx 0$. If we interpret this, $f'(x)$ is a $k\times m$ matrix, and $f''(x)$ is a linear transform whose input is the vector $h \in \mathbb R^m$ and whose output is a $k \times m$ matrix. The transform $f''$ encodes how the derivative matrix changes with the input (distance between matrices defined with the matrix norm above).


Edit: If you want to interpret the linear transform $f''(x)$ as an object like a matrix, there is no matrix $B$ for which $h$ can be an $m$-vector and yet the product $Bh$ is a $k\times m$ matrix. But you can identify $f''(x)$ with a tensor product between a matrix and a vector, an object of the form $\sum_{i,j} c_{i,j} (M_i \otimes e_j)$, where the sum ranges over a basis $(M_i)_i$ for $\mathbb R^{k\times m}$ and a basis $(e_j)_j$ for $\mathbb R^m$.

Imagine that you have an $m$-long row vector, yet each of the components of the vector is a whole $k\times m$ matrix. This is how the tensor product can be interpreted, and it has the property that when you multiply it on an $m$-vector, the result is a $k\times m$ matrix (it is a linear combination of the row vector's "matrix-components"), and this is the desired property of the object $f''(x)$.

This is not exactly the same as identifying $f''(x)$ with a block matrix, as block matrix multiplication does not behave quite like how I have outlined above.