Understanding matrix multiplication [duplicate]

I have a hard time understanding, on an intuitive level, what matrix multiplication actually does. I have used it a lot, but I do not really know what it does.

I know that $Ax = y$, where $A$ is a matrix and $x$ is an $n$-tuple, is just another way writing a system of equations. Seeing it this way, matrix addition is quite intuitive. Since every elementary matrix can be seen as representing an elementary row operation, I can understand what matrix multiplication does with invertible matrices. However, that still does not explain anything when we are working with matrices that are not invertible.

What is a good way of thinking about matrix multiplication?


Solution 1:

Although the duplicate question has an excellent top-voted answer, it may be a bit too dense for beginners. Here's a simple but instructive example. Fibonacci numbers! We all know them, don't we? =) $\def\nn{\mathbb{N}}$

Let $F_0 = 0$ and $F_1 = 1$ and $F_{n+2} = F_{n+1} + F_n$ for any $n \in \nn$.

$F$ is of course the Fibonacci sequence. To capture the underlying structure of the sequence, we would like to 'factor out' the "$n$" from the above defining recurrence relation. How do we do that? Matrices are a natural solution, as we shall see.

A matrix can be used to encode a linear transformation. How do we view the recurrence relation as a transformation? Our objective is to pass along enough information so that we can generate the sequence by iterating some transformation. Clearly then we need to maintain a pair of consecutive terms, and the transformation is to go from that pair of terms to the next pair:

$(F_{n+1},F_n) \mapsto (F_{n+2},F_{n+1})$.

That immediately gives us the transformation:

$F_{n+2} = 1F_{n+1} + 1F_n$.

$F_{n+1} = 1F_{n+1} + 0F_n$.

Why have I written it like this? Because now we can 'factor out those "$n$"s by fiat! To do so, define:

$\pmatrix{a&b\\c&d} \pmatrix{x\\y} = \pmatrix{ax+by\\cx+dy}$.

Note that the notation on the left is purely notation (for now), and has absolutely nothing to do with matrices or multiplication or anything at all. It just means what is on the right!

Then we can write our transformation as:

$\pmatrix{F_{n+2}\\F_{n+1}} = \pmatrix{1&1\\1&0} \pmatrix{F_{n+1}\\F_{n}}$.

Great! We seem to have 'factored out' the recurrence structure from the terms in the sequence. But as mentioned this is merely notation so far. For this representation to be useful, we need to interpret the notation better.

Firstly, we can treat $\pmatrix{a&b\\c&d}$ as a function on $2$-vectors (which are those things like $\pmatrix{x\\y}$ that consists of a single column of $2$ numbers), specifically:

The function $\pmatrix{a&b\\c&d}$ applied to input $\pmatrix{x\\y}$ gives output $\pmatrix{ax+by\\cx+dy}$.

For example:

$\pmatrix{F_{n+3}\\F_{n+2}} = \pmatrix{1&1\\1&0} \left( \pmatrix{1&1\\1&0} \pmatrix{F_{n+1}\\F_{n}} \right) = \left( \pmatrix{1&1\\1&0} \circ \pmatrix{1&1\\1&0} \right) \pmatrix{F_{n+1}\\F_{n}}$.

Now we can fully capture the iteration of a transformation, which is just repeated application of a function. As usual, for any function $f$ we use "$f^k$" to mean its $k$-fold composition.

The Fibonacci sequence is now completely expressed by:

$\pmatrix{F_{n+1}\\F_n} = \pmatrix{1&1\\1&0}^n \pmatrix{F_{1}\\F_{0}}$.

Notice two important things. Firstly, this representation cleanly divides into the recurrence and the initial values, and works for any choice of $(F_0,F_1)$. Secondly, all we need to do now is to be able to compute $\pmatrix{1&1\\1&0}^n$ directly before applying the resulting function to $\pmatrix{F_{1}\\F_{0}}$. For this we must find out what is the composition of two functions $\pmatrix{a&b\\c&d}$ and $\pmatrix{p&q\\r&s}$. Completely by our above definition we will get:

$\left( \pmatrix{a&b\\c&d} \circ \pmatrix{p&q\\r&s} \right) \pmatrix{x\\y} = \pmatrix{ap+br&aq+bs\\cp+dr&cq+ds} \pmatrix{x\\y}$. [Work this out yourself!]

By the definition of equality of functions we get:

$\left( \pmatrix{a&b\\c&d} \circ \pmatrix{p&q\\r&s} \right) = \pmatrix{ap+br&aq+bs\\cp+dr&cq+ds}$.

Conventionally, each of these three functions is called a matrix, and we shall now define matrix multiplication to be exactly composition! This is the reason for matrix multiplication.

Finally, one should notice that the most efficient way to compute they $n$-fold composition of $\pmatrix{a&b\\c&d}$ is to use the following facts for any function $f$ and $n \in \nn$:

$f^{2n} = (f^n)^2$.

$f^{2n+1} = f^{2n} \circ f$.

This actually yields one of the fastest algorithms possible to compute the $n$-th Fibonacci number with arbitrary precision. (The closed form formula is of course faster for fixed precision.)

Solution 2:

See my other answer that explains the motivation and interpretation of matrix multiplication. In this follow-up answer I give a brief explanation of elementary row operations, which is a rather different topic actually.

Notice that an elementary row operation does not change the solution set of the represented system of linear equations. What exactly is happening, in terms of the linear transformation viewpoint?

Well, firstly the elementary matrices are invertible transformations, because we can write down explicitly the inverse elementary matrices, which are of course nothing more than the inverse functions. But what really are these functions?

The scale-row operation is a scaling along a coordinate axis.

The swap-row operation is a reflection about a coordinate axis.

The add-multiple-to-another-row operation is a shear along a coordinate axis that moves another coordinate axis but keeps all other coordinate axes in their places.

A lot of properties of matrices fall out from these three facts. For one, the determinant can be easily shown to be changed by scale-row operations (by exactly the same factor), negated by the swap-row operation, but unchanged by the third operation. Notice also that the signed volume is changed in exactly the same way, so this immediately tells us that the determinant of a matrix is the signed volume of the unit cube after it has been transformed by the matrix! This can be used to explain the change of variables theorem for multi-variable integration.

Secondly, those row operations are often performed on the so-called augmented matrix, simply because it corresponds to applying the transformations to both sides of the system of equations. The system starts of as:

$A v = w$.

where $A$ is a matrix and $v,w$ are vectors. If you succeed in applying elementary row operations until $A$ becomes the identity matrix, then the system has a unique solution because you would end up with:

$v = A^{-1} w$.

If that is impossible, it means that $A$ is non-invertible. You can then ask in general about the reduced row-echelon form. That corresponds to breaking $A$ up into $E \circ R$ where $E$ is a composition of elementary matrices and $R$ is in reduced row-echelon form. Then you have:

$(ER)v = w$.

and after the row operations what you would have gotten is:

$Rv = E^{-1}w$.

Depending on the form of $R$ and the vector on the right, you either get no solution or at least one solution. In the latter case, the solution set is obvious from the form of $R$, and corresponds to a subspace (point / line / plane / ...) of the domain of $R$.

In some sense therefore the reduced row-echelon form is the core of the matrix transformation, in that the rest of the transformation is invertible. If $A$ is square, and $R$ is not the identity, then $R$ is geometrically a projection.

Solution 3:

\begin{align} u & = 3x+4y & & & p & = 12u -7v \\ v & =-9x + 5y & & & q & = 3u + 6v \end{align} Now write $p$ and $q$ as functions of $x$ and $y$. You'll get \begin{align} p & = [\bullet] x + [\bullet] y \\ q & = [\bullet] x + [\bullet] y \end{align} The four $[\bullet]$s are the entries in the product of the two matrices.