Constructing the Hessian for a numerical ode solver
I'm working on a project where I need to find the Hessian $\boldsymbol{H}=\frac{\partial^2 \boldsymbol{q}_m}{\partial \boldsymbol{q}_0^2}\in \mathbb{R}^{n\times n \times n}$ of the function $\boldsymbol{q}_m=\boldsymbol{f}(\boldsymbol{q}_0)$, where $\boldsymbol{q}_i \in \mathbb{R}^n$. For the most easy case, the following scheme can be used to calculate $\boldsymbol{q}_{i+1}$:
$\boldsymbol{q}_{i+1} = \boldsymbol{f}_i(\boldsymbol{q}_{i}) = \boldsymbol{q}_{i} + h\boldsymbol{g}(\boldsymbol{q}_{i})$
where $\boldsymbol{q}_{i}$ is the state vector after the $i$th step, $h$ a constant and $\boldsymbol{g}(\boldsymbol{q}_{i})$ an analytic function $\boldsymbol{g}:\mathbb{R}^n \to \mathbb{R}^n$.
In my application it is relatively easy to find $\boldsymbol{H}_i=\frac{\partial^2 \boldsymbol{q}_{i+1}}{\partial \boldsymbol{q}_i^2}$ (for 1 solver step), but it's obvious that calculating the full Hessian $\boldsymbol{H} \in \mathbb{R}^{n\times n \times n}$ becomes rather cumbersome for large $m$.
I've already made quite some attempts to construct the full Hessian $\boldsymbol{H}$, by using the intermediate Hessians $\boldsymbol{H}_i$. As I'm not a mathematician though, I feel I lack some important insights to fully understand what's happpening. In the next paragraphs I'll write down how I want to assemble the full Hessian, but some parts are still vague and not fully understood by myself.
Step 1: Assemble the Jacobian $\boldsymbol{J}=\frac{\partial \boldsymbol{q}_m}{\partial \boldsymbol{q}_0}\in \mathbb{R}^{n\times n}$, by using the chain rule:
$\boldsymbol{J}=\frac{\partial \boldsymbol{q}_m}{\partial \boldsymbol{q}_{m-1}}\frac{\partial \boldsymbol{q}_{m-1}}{\partial \boldsymbol{q}_{m-2}}...\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}$
or
$\boldsymbol{J}=\boldsymbol{J}_{m-1}\boldsymbol{J}_{m-2}...\boldsymbol{J}_{1}\boldsymbol{J}_{0}$
Step 2: In the case of the Hessian, it gets much more complicated. I'll write down my derivation for two solver steps and generalize the solution for $m$ steps:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}\right)$
Applying the chain rule leads to:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\right)\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}+\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}\right)\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}$
Inserting the intermediate Jacobian definitions, leads to:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\right)\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Changing the order of derivation in the first term yields:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left[\left(\frac{\partial }{\partial \boldsymbol{q}_{1}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}}\right)\right]^{\text{T}}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Expanding $\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}}$ results in:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left[\left(\frac{\partial }{\partial \boldsymbol{q}_{1}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}\right)\right]^{\text{T}}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Applying the chain rule a second time and rewriting the first term between the square brackets gives:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left[\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}^2}\boldsymbol{J}_0+\left(\frac{\partial }{\partial \boldsymbol{q}_{1}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}}\right)\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\right]^{\text{T}}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Rewriting the second term between the square brackets leads to:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left[\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}^2}\boldsymbol{J}_0+\left[\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{1}}\right)\right]^{\text{T}}\frac{\partial \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{1}}\right]^{\text{T}}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Obviously, this term vanishes because $\left(\frac{\partial }{\partial \boldsymbol{q}_{0}}\right)^{\text{T}}\left(\frac{\partial \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{1}}\right)=0$, which results in:
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\left[\frac{\partial \boldsymbol{q}^2_{2}}{\partial \boldsymbol{q}_{1}^2}\boldsymbol{J}_0\right]^{\text{T}}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
or
$\frac{\partial^2 \boldsymbol{q}_{2}}{\partial \boldsymbol{q}_{0}^2}=\boldsymbol{J}_0^{\text{T}}\frac{\partial \boldsymbol{q}^2_{2}}{\partial \boldsymbol{q}_{1}^2}\boldsymbol{J}_0+\frac{\partial^2 \boldsymbol{q}_{1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_1$
Step 3: For $k$ steps, the equation becomes:
$\frac{\partial^2 \boldsymbol{q}_{k}}{\partial \boldsymbol{q}_{0}^2}=\boldsymbol{G}_k^{\text{T}}\frac{\partial \boldsymbol{q}^2_{k}}{\partial \boldsymbol{q}_{k-1}^2}\boldsymbol{G}_k+\frac{\partial^2 \boldsymbol{q}_{k-1}}{\partial \boldsymbol{q}_{0}^2}\boldsymbol{J}_{k-1}$
where $\boldsymbol{G}_k = \boldsymbol{J}_{k-2}\boldsymbol{J}_{k-3}...\boldsymbol{J}_{1}\boldsymbol{J}_{0}$
My questions:
- Is the above derivation correct? I guess not everything is well-written (in a mathematical sense). But I hope the general concept is ok.
- How should I interprete the tensor - matrix multiplications (2 terms in the last equation)? I've got some difficulties understanding when/where/why to use the n-mode multiplication. Some clarification would be very helpful!
PS: I'm sorry if some of the above equations are not written well. Feel free to change the notations where requried, as I can always learn more!
Edit: replace the number of steps $n$ by $m$, as I already used the symbol $n$ for the vector/matrix/tensor dimensions.
With tensors of order 3 or higher, the application of matrix-vector products becomes ambiguous. There are several approaches in tensor or Ricci calculus to get the formulas into a compact notation. For Taylor formulas I prefer to write the occurring vector-valued multi-linear forms as directional derivatives in the notation $C(x)[v_1,...,v_k]$ for a tensor that has coefficients depending on $x$, transforming $k$ input vectors into a result vector.
So start with $q_0$ and a perturbation $\delta q_0^{[1]}$, $\delta\approx 0$. Then $$ q_1+ δq_1^{[1]}+ \tfrac12δ^2q_1^{[2]}+...=f_1(q_0+δq_0^{[1]})=f_1(q_0)+δf_1'(q_0)[q_0^{[1]}]+\tfrac12δ^2f_1''(q_0)[q_0^{[1]},q_0^{[1]}]+... $$ Next insert the quadratic approximation into the next step \begin{align} q_2+ δq_2^{[1]}+ \tfrac12δ^2q_2^{[2]}+...&=f_2(q_1+δq_1^{[1]}+ \tfrac12δ^2q_1^{[2]}+...)\\ &=f_2(q_1)+f_2'(q_1)[δq_1^{[1]}+ \tfrac12 δ^2q_1^{[2]}]+\frac12f_2''(q_1)[δq_1^{[1]},δq_1^{[1]}]+... \\ \hline q_2^{[1]}&=f_2'(q_1)[q_1^{[1]}]=f_2'(q_1)[f'(q_0)[q_0^{[1]}]]\\ q_2^{[2]}&=f_2'(q_1)[q_1^{[2]}]+f_2''(q_1)[q_1^{[1]},q_1^{[1]}]\\ &=f_2'(q_1)[f_1''[q_0^{[1]},q_0^{[1]}]]+f_2''(q_1)[f_1'(q_0)[q_0^{[1]}],f_1'(q_0)[q_0^{[1]}]] \end{align} The recursion pattern remains stable from now on, \begin{align} q_{k+1}^{[1]}&=f_{k+1}'(q_k)[q_k^{[1]}],\\ q_{k+1}^{[2]}&=f_{k+1}'(q_k)[q_k^{[2]}]+f_k''(q_k)[q_k^{[1]},q_k^{[1]}], \end{align} expanding the recursion however will lead to really expanded formulas.
Note that in the second derivative the Jacobian is in first place in the first term. Denoting with $J_{n:m}$ the Jacobian of $q_n$ relative to $q_m$ and similarly $H_{n:m}:V\times V\to V$ for the Hessean, one can also write the last as \begin{align} J_{k+1:0}&=J_{k+1:k}J_{k:0}\\ H_{k+1:0}&=J_{k+1:k}H_{k:0}+H_{k+1:k}[J_{k:0}\,\cdot,J_{k:0}\,\cdot] \end{align} where the single-step variants are $J_{k+1:k}=f_{k+1}'(q_k)$ and $H_{k+1:k}=f_{k+1}''(q_k)$.
In the usual Ricci or Einstein tensor notation, every object that transforms with the basis is covariant and has lower indices, everything that transforms dual (also exchanging domain and range) to the basis, like the coordinate functionals, is contravariant and has upper indices. Thus a vector $v$ is represented by its coordinate values $v^\mu$, the representation of the Jacobian takes these values and produces new contravariant values, $J^{\mu}{}_\nu$, and the Hessean absorbs two input vectors to produce one output vector, $H^\kappa{}_{\mu\nu}$. The Einstein summation is the convention that if the same index appears in upper and lower position in a product, then the product is implicitly already a sum over this index over all dimensions.
In this notation the last two identities read as \begin{align} (J_{k+1:0})^\mu{}_\nu&=(J_{k+1:k})^\mu{}_\lambda(J_{k:0})^\lambda{}_\nu \\ (H_{k+1:0})^{\kappa}{}_{\mu\nu}&=(J_{k+1:k})^{\kappa}{}_\lambda(H_{k:0})^{\lambda}{}_{\mu\nu}+(H_{k+1:k})^{\kappa}{}_{\alpha\beta}(J_{k:0})^\alpha{}_\mu(J_{k:0})^\beta{}_\nu. \end{align}