The gradient as a row versus column vector

Yes, the distinction between row vectors and column vectors is important. On an arbitrary smooth manifold $M$, the derivative of a function $f : M \to \mathbb{R}$ at a point $p$ is a linear transformation $df_p : T_p(M) \to \mathbb{R}$; in other words, it's a cotangent vector. In general the tangent space $T_p(M)$ does not come equipped with an inner product (this is an extra structure: see Riemannian manifold), so in general we cannot identify tangent vectors and cotangent vectors.

So on a general manifold one must distinguish between vector fields (families of tangent vectors) and differential $1$-forms (families of cotangent vectors). While $df$ is a differential form and exists for all $M$, $\nabla f$ can't be sensibly defined unless $M$ has a Riemannian metric, and then it's a vector field (and the identification between differential forms and vector fields now depends on the metric).

If one thinks of tangent vectors as column vectors, then $\nabla f$ ought to be a column vector, but the linear functional $\langle -, \nabla f \rangle$ ought to be a row vector. A major problem with working entirely in bases is that distinctions like these are frequently glossed over, and then when they become important students are very confused.


Some remarks about non-canonicity. The tangent space $T_p(V)$ to a vector space at any point can be canonically identified with $V$, so for vector spaces we don't run into quite the same problems. If $V$ is an inner product space, then in the same way it automatically inherits the structure of a Riemannian manifold by the above identification. Finally, when people write $V = \mathbb{R}^n$ they frequently intend $\mathbb{R}^n$ to have the standard inner product with respect to the standard basis, and this equips $V$ with the structure of a Riemannian manifold.


Here's a simple heuristic. TL;DR: If the domain of $f: R^n\to R$ is a space of column vectors, then $f'(x)$ needs to be a row vector for the linear approximation property to make sense.

To make the distinction between row and column vectors explicit, I'll write $R^{k\times 1}$ and $R^{1\times k}$ for the spaces of $k$-dimensional column and row vectors of real numbers, respectively.

If $f:R^{n\times 1}\to R^{m\times 1}$, then, for every $x\in R^{n\times 1}$, the derivative $f'(x)$ is characterized by its linear approximation property, $$ f(x + h) \approx f(x) + f'(x)h, $$ for small $h\in R^{n\times 1}$. Now think about the sizes of the matrices involved: $f(x)$ has size $m\times 1$, so $f'(x)h$ also needs to have size $m\times 1$ for the right hand side to make sense. But $h\in R^{n\times 1}$, so $f'(x)h$ has size $m\times 1$ exactly when $f'(x)$ has size $m\times n$.

In particular, if $f:R^{n\times 1}\to R$, then $f'(x)$ needs to be an $1\times m$ matrix, i.e., a row vector.