Gradient and Jacobian row and column conventions

Say $f$ is a scalar valued function from $\mathbb{R}^n \to \mathbb{R}$. When I learnt about the gradient $\nabla f(\mathbf{x})$ I always thought of it as a column vector in the same space as $\mathbf{x}$. That way, the dot product $\nabla f \cdot \mathbf{v}$ gives the directional derivative in direction $\mathbf{v}$.

All the definitions I can find of the Jacobian of $\mathbf{y} = \psi(\mathbf{x})$ however define it as:

\begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots&\vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}

But this would make $\nabla f$ a row vector, which then means the directional derivative is no longer $\nabla f \cdot \mathbf{v}$.

Which way is correct? What are the consequences if I accidently write the Jacobian the opposite way? I have found some similar questions here but none that answer my question directly. I'm still learning this stuff so please explain in simple terms :)


Solution 1:

In general, the derivative of a function $f : \mathbb{R}^n \to \mathbb{R}^m$ at a point $p \in \mathbb{R}^n$, if it exists, is the unique linear transformation $Df(p) \in L(\mathbb{R}^n,\mathbb{R}^m)$ such that $$ \lim_{h \to 0} \frac{\|f(p+h)-f(p)-Df(p)h\|}{\|h\|} = 0; $$ the matrix of $Df(p)$ with respect to the standard orthonormal bases of $\mathbb{R}^n$ and $\mathbb{R}^m$, called the Jacobian matrix of $f$ at $p$, therefore lies in $M_{m \times n}(\mathbb{R})$.

Now, suppose that $m=1$, so that $f : \mathbb{R}^n \to \mathbb{R}$. Then if $f$ is differentiable at $p$, $Df(p) \in L(\mathbb{R}^n,\mathbb{R}) = (\mathbb{R}^n)^\ast$ is a functional, and hence the Jacobian matrix, as you point out, lies in $M_{1 \times n}(\mathbb{R})$, i.e., is a row vector. However, by the Riesz representation theorem, $\mathbb{R}^n \cong (\mathbb{R}^n)^\ast$ via the map that sends a vector $x \in \mathbb{R}^n$ to the functional $y \mapsto \left\langle y,x \right\rangle$. Hence, if $f$ is differentiable at $p$, then the gradient of $f$ at $p$ is the unique (column!) vector $\nabla f(p) \in \mathbb{R}^n$ such that $$ \forall h \in \mathbb{R}^n, \quad Df(p)h = \left\langle \nabla f(p),h\right\rangle; $$ in particular, if you unpack definitions, you'll find that the Jacobian matrix of $f$ at $p$ is precisely $\nabla f(p)^T$.

Solution 2:

Imagine writing $\nabla f \cdot \mathbf{v}$ out. You would take the transpose of $\nabla f$ and put it next to $\mathbf{v}$ in order to carry out the dot product. So, really, $\nabla f$ is, and should be thought of as, a row vector. In other words, you're quite correct. It is, in fact, well depending on how you look at it, a convention of most multi-variable calculus textbooks that they write gradients as column vectors. I suppose I should mention the fancy, differential-geometric way of saying this is that the gradient of a scalar function is a covariant vector, and despite the informality with which I said it, this is actually the understanding behind the formalism of the gradient and its higher generalizations to manifolds.

It might help you out some to look at the Riesz representation theorem, or the bra-ket notation in quantum mechanics.