The connection between the Jacobian, Hessian and the gradient?

Solution 1:

You did not do anything wrong in your calculation. If you directly compute the Jacobian of the gradient of $f$ with the conventions you used, you will end up with the transpose of the Hessian. This is noted more clearly in the introduction to the Hessian on Wikipedia (https://en.wikipedia.org/wiki/Hessian_matrix) where it says

The Hessian matrix can be considered related to the Jacobian matrix by $\mathbf{H}(f(\mathbf{x})) = \mathbf{J}(∇f(\mathbf{x}))^T$.

The other Wikipedia article should probably update the language to match accordingly.

As for the gradient of $f$ is being defined as a row vector, that is the way I have seen it more often, but it is noted https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions that there are competing conventions for general matrix derivatives. However, I don't think that should change your answer for the Hessian- with the conventions you are using, you are correct that it should be transposed.