Why are we allowed to multiply a 1x1 matrix by any matrix? [duplicate]

So, in order to multiply 2 matrices, there must be the same number of columns in the left matrix as there are rows in the right matrix. So if $A$ is an $m \times n$ matrix and $B$ is a $p \times q$ matrix, must not $n = p$ in order for $AB$ to exist and mustn't $q = m$ in order for $BA$ to exist. By this logic, we should only be allowed to multiply a $1 \times 1$ matrix by either a $1 \times n$ matrix on the right or a $n \times 1$ matrix on the left. However, if $C$ is a $1 \times 1$ matrix and $D$ is a $m \times n$ matrix, where neither $m$ nor $n = 1$, we're allowed to multiply the 2 matrices simply by multiplying each entry in $D$ by the entry in $C$. Why? Shouldn't the rules for a $1 \times 1$ matrix be the same as for all the other matrices?


Solution 1:

$M_{m \times n}(\mathbb{R})$ denote the set of $m \times n$ matrices with coefficients in $\mathbb{R}$, this is a vector space over $\mathbb{R}$, and is isomorphic to $\mathbb{R}^{mn}$. You are right that technically, matrix multiplication is not well defined between a $1 \times 1$ matrix and an $m \times n$ matric unless $n=1$, but scalar multiplication is, i.e it is perfectly legitimate to multiply a matrix by a real number component-wise. Since $M_{1 \times 1}(\mathbb{R})$ is isomorphic to $\mathbb{R}$, you can define (or, if you prefer, simply think of) identifying a $1 \times 1$ matrix with a unique element in $\mathbb{R}$, and define "multiplication" by this $1 \times 1$ matrix into any other matrix as simply multiplying by the scalar. This will work over any other field as well.

Solution 2:

Normally, you don't define the product of an $1\times 1$ matrix with a general $m\times n$ matrix. Instead you define the product of a real number with an arbitrary matrix (note that throughout here I'm assuming real matrices; otherwise substitute "real number" for whatever it is that your matrices are composed of). Note that this is is a different product than the product of two matrices.

Note also that formally, an $1\times 1$ matrix is something different than a single number: An $1\times 1$ matrix is a rectangular arrangement of one number. You may say that's nitpicking, but the point is that this nitpicking is important to understand what is going on. Then when you understand it, you can start going sloppy and not strictly distinguish between both because wherever you can use either, they behave basically the same way (I'll get more specific on that below).

So let's have a look at a matrix in a different way: A matrix is a function which accepts two indices, and then tells you the real number that sits at that place. That function is conventionally written with indices instead of parentheses, but that doesn't make it any less of a function. For example, the matrix $$A = \begin{pmatrix} 2 & 3 \\ 5 & 7 \end{pmatrix}$$ is a function that maps the pair $(1,1)$ to the value $2$, the pair $(1,2)$ to the value $3$, the pair $(2,1)$ to the value $5$ and the pair $(2,2)$ to the value $7$. Usually that is written as $A_{11}=2$, $A_{12}=3$, $A_{21}=5$ and $A_{22}=7$. But you could in principle just as well write $A(1,1)=2$, $A(1,2)=3$, $A(2,1)=5$ and $A(2,2)=7$, and it would mathematically not make the slightest difference; it's just a convention that in some cases functions are written in index notation.

Now on those matrices, you define an addition and two multiplications:

  • The addition of two matrices is given by element-wise addition.
  • The multiplication of a matrix with a number is given by element-wise multiplication.
  • The multiplication of two matrices is given by $C_{ij}=\sum_kA_{ik}B_{kj}$.

At this point it is important that we are really speaking about two different operations. In principle, those operations have nothing to do with each other, but of course they are defined in a way that they are "compatible" with each other as well as with yet another multiplication, namely the multiplication of two numbers, and yet another addition, the addition of two numbers.

So when you work with matrices, you are really using five different operations: Two different additions (number plus number, matrix plus matrix) and three different multiplications (number times number, number times matrix, matrix times matrix). However, we write both additions and all three multiplications the same. And we can get away with it, because on one hand, we can always determine which is meant by looking at their arguments, and on the other hand, the operations are compatible to each other, so that the notational simplifications we already use for the single multiplications work even if we mix them. For example, for numbers we can write $abc$ instead of either $(ab)c$ or $a(bc)$ because we know that those expressions give the same result. Now if you do the same with mixed products, for example (using lowercase letters for numbers and uppercase letters for matrices) in $(ab)C$ you've got a number-number product and a number-matrix product, while in $a(bC)$ you've got two number-matrix products, so the two expressions don't even consist of the same type of product. Yet they are defined in a way that those two products give the same result, and therefore we can still simply write $abC$ despite the fact that different ways of setting parentheses involve different types of product.

OK, but what about the $1\times 1$ matrix? Well, that one is a function from the single-element set $\{(1,1)\}$ to the real numbers. Of course, this means there is only one function value, which is a real number. In standard matrix notation, you can write it with one number in parentheses, e.g. $(5)$. Note that since this notation really describes a function, this is not the same as the number $5$. However, when looking at the rules for matrix operations, one finds that they map exactly to number operations on the single entry, that is, the matrix addition maps to number additions, and both multiplications map to number multiplication: \begin{aligned} (a) + (b) &= (a+b) && \text{Matrix addition}\to\text{number addition}\\ a (b) &= (ab) && \text{multiplication number with matrix}\to\text{number multiplication}\\ (a) (b) &= (ab) && \text{multiplication matrix with matrix}\to\text{number multiplication} \end{aligned} Note again, that there are still two different multiplications involving matrices, which map to the same multiplication, namely the multiplication of two numbers.

Moreover, if you look at the cases where you can multiply an $1\times 1$ matrix with another matrix (you've correctly identified those), we see that again you get the same result as multiplying that other matrix with the number inside that $1\times 1$ matrix (which, I stress again, is another operation).

So everywhere where we can use $1\times 1$ matrices, when we replace those by the numbers inside them and at the same time replace the operations to the appropriate type (a fact that is hidden by the fact that we write all those operations with the same symbol — another point where strictly speaking we are sloppy), then we still get the exact same result. And that is why you can get sloppy and not strictly distinguish both concepts.

However, note the restriction: *where we can use $1\times 1$ matrices. There are cases where you can use numbers, but you can not use $1\times 1$ matrices. And one of those cases your question is about: You cannot multiply a general $m\times n$ matrix with an $1\times 1$ matrix. But you can multiply it with an arbitrary number. So if you get sloppy and don't strictly distinguish between $1\times 1$ matrices and numbers, when you see a product of a matrix and a number, it may look like a product of a matrix with a 41\times 1$ matrix, because you are sloppy.

There are three possible ways to deal with it:

  • Don't be sloppy. Always keep track whether you've got a number, or an $1\times 1$ matrix. This is, of course, the least error-prone way.

  • Be sloppy, but keep in mind you are being sloppy, and when encountering a context where only numbers make sense, but what you have there is really an $1\times 1$ matrix, mentally insert the operation "the number inside", $(a)\mapsto a$. For example, formally the product of a $1\times n$ matrix and an $n\times 1$ matrix is an $1\times 1$ matrix. Being sloppy, you may identify that $1\times 1$ matrix with the number inside, and use that number in a context that doesn't allow an $1\times 1$ matrix (like multiplying it with a general $m\times n$ matrix). That's formally not allowed, but by implicitly applying the "the number inside" function, you can, in an unambiguous way, turn it into a well-defined function. Similarly, if you have a number and need an $1\times 1$ matrix, you can mentally insert the "put it in a $1\times 1$ matrix" operation, $a\mapsto (a)$.

  • Make the sloppiness rigorous by defining the "missing" operations. So in your case, you can define yet another multiplication, between an $1\times 1$ matrix and an $m\times n$ matrix, that is defined as just multiplying the $m\times n$ matrix with the number inside the $1\times 1$ matrix.

Obviously the first option is the conceptually simplest one, but might get tedious in certain contexts, while the second option is the most flexible one, at the cost of requiring the reader to "fill in the gaps". That's why one of those will generally be used. The first is more likely to be used when teaching, or in contexts where the second option would be no big advantage or even a disadvantage, while the second option will more likely be used when the audience can be expected to be experienced with matrices and the sloppiness provides significant simplification.

Note that the sloppiness can be justified only because it does not introduce any ambiguity. Whenever sloppiness could create ambiguity, one should be strict in the application of concepts.

Solution 3:

Scalar multiplication can be defined as short-hand for matrix product with kronecker product with identity matrix:

$$s{\bf A} = (s\otimes{\bf I}_m){\bf A} = {\bf A}(s\otimes{\bf I}_n)$$

So it is a short hand to not have to write cumbersome expressions and Kronecker products would probably be confusing to introduce when you start learning linear algebra.


EDIT One thing that is practical with this interpretation is that it will remain faithful with respect to abstractions of matrix representation. If we have a 2x2 complex matrix implemented as a 4x4 real matrix (2x2 blocks of 2x2 each), then "scalar multiplication" for our representation would be multiplication with the kronecker product between the complex scalar (implemented as a 2x2 real block) and ${\bf I}_2$.