Why are matrices indexed by row first?

While I believe that the true reason is purely historical, here’s a possible explanation for why the common notation might be more natural, and it boils down to the following facts:

  1. We like to think of the composition $g ∘ f$ of maps $f \colon X → Y$, $g \colon Y → Z$ as “$g$ after $f$” not the other way around (viz. to ensure that “$(g∘f)(x) = g(f(x))$” holds, which is intuitive).
  2. We like to think of matrices as linear maps in a “covariant way”.
  3. We like to think of vectors as columns (to avoid wasting precious horizontal space (because we write from left to right, not from top to bottom)).
  4. We like to cancel stuff out in the middle rather than outside, because that sticks better in my (our?) memory.

So let $R$ be a ring (or a field, take $R = ℚ$ in need) and read on.

In linear Algebra, we often use matrices $A ∈ \operatorname{Mat}_{m×n}(R)$ as maps that are given by multiplication $$f_A \colon R^n → R^m,~x ↦ Ax,$$ for which we actually need to think of vectors in $R^n$ and $R^m$ as columns.

Now, we could also think of them as maps $$f_A^{op} \colon R^m → R^n,~x ↦ xA,$$ but then we would

  • lose the nice “covariant” identity $f_A∘f_B = f_{AB}$ (and would end up instead with the “contravariant” identity $f_B^{op}∘f_A^{op} = f_{AB}^{op}$), and
  • need to think of $R^m$ and $R^n$ as rows rather than columns or
  • redefine the matrix product to multiply columns of the first matrix onto rows of the second (instead of rows of the first onto columns of the second).

While the last two points might not be an issue, the first one is for many people. So we favor the interpretation $f_A$ over the interpretation $f_A^{op}$.

Okay, so now if you write out a product of a matrix by a vector, you really want to preserve horizontal space, so you rather think of the vector as being a column. We also would like to consider this product to be a special case of a general matrix product, considering vectors as special matrices.

This forces us to multiply matrices by the scheme “dot product of rows of the first matrix by columns of the second matrix”. In particular, the number of rows of a product is determined by its first factor, the number of columns by its second factor.

Now, if we were to annotate by $(j,i)$ the entry at the $j$-th column and $i$-th row of a matrix, the definition of the matrix product $$(c_{ji})_{n×m} = (a_{ki})_{q×m}·(b_{jk})_{n×q}$$ would be $$c_{ji} = \sum_{k=1}^q a_{ki}·b_{jk},$$ so you wouldn’t be able to memorize it as “the middle index is cancelling out” which, to me, sounds way more natural than “the outer indices are cancelling out”.

Of course, you sacrifice the naturality of having the two-dimensional grid, reminiscent of the cartesian plane, as you suggested. But since we are already writing matrices top-left to bottom-right (because that’s the way we write) as others already pointed out in the comments, you can’t really think of a matrix in the same way as the cartesian plane anyway, so why try to force it? The way it is, it at least preserves the mathematical positive orientation and if you tilt your head by $-π/2$, you again have your cartesian plane!