What is the theory of Matrices?

Matrix theory can be viewed as the calculational side of linear algebra. Linear algebra is the theory of vectors, vector spaces, linear transformations between vector spaces, and so on, but if one wants to calculate particular instances, one uses matrix algebra. In part it is a body of notational conventions for how one represents the abstractions described by linear algebra, and in part a collection of recipes for manipulating these notations.

The boundary between MA and LA is not crisp, and a case can be made that there are topics in MA that are not really discussed in LA, so my description above is perhaps simplistic. In particular, the Perron-Frobenius theorem seems inherently bound to matrices and don't seem to have clean abstract linear algebra formulation. Similarly the subject of "total positivity".


Matrices are compact representations of linear systems of equations.

These types of problems are called "linear" because they are closely related to straight lines (and flat planes in higher dimensions).

Note: Matrices can be used in a large variety of ways, but in this answer, I will focus on their historical relationship to simple algebra problems. It should be noted, however, that some properties of matrices are easier to understand from other perspectives (i.e. from their applications in geometry, etc.).

All of your questions will be answered by the end, but it will take a little time to motivate and justify those answers. So please bear with me.

THE PROBLEM

Remember in grade school when you first learned to solve problems like the following?

$$ 7x+2y=5 \\ 3x-4y=7 $$

Well, how exactly did you solve a problem like this?

By Graphing

One way was to solve both equations for $y$, plot them as two straight lines on a Cartesian plane, find their intersection point, and list the ordered pair corresponding to that point as the answer. Here is a Wolfram Alpha page doing exactly that. The intersection point is $(1,-1)$; therefore, the solution is $x=1$ and $y=-1$.

From this perspective, the "linear" nature of the problem is fairly obvious.

By Substitution

Another standard way to solve this problem is by "substitution" - which involves solving one equation for $y$ in terms of $x$ and then plugging it into the other equation to find $x$ (and then $y$). Like this:

$$ 7x+2y=5\\2y=5-7x\\y=\frac{5}{2}-\frac{7}{2}x\\ \ \ \ \\ \ \ \ \\ 3x-4y=7\\3x-4(\frac{5}{2}-\frac{7}{2}x)=7\\3x-10+14x=7\\17x=17\\x=1\\ \ \ \ \\ y=\frac{5}{2}-\frac{7}{2}(1)\\y=-1 $$

This method has the advantage of being less involved than the graphing method, but it is also more abstract. Calculating the answer is more straightforward, but the connection to geometry is much less apparent. This will be a recurring theme from here on: We will continue to trade obviousness and simplicity for elegant calculations.

By Row Operations

The last method that is normally taught is to perform operations on an entire equation (like multiplying by a number) and then to add it to the other equation. When done thoughtfully, this method dramatically speeds up the problem-solving process. Here's how it could work in this case:

$$ 7x+2y=5 \\ 3x-4y=7\\ \ \ \ \\ 2(7x+2y=5) \rightarrow 14x+4y=10\\ \ \ \\(14x+4y=10)\\+ (3x-4y=7)\\ \rule{4 cm}{0.4pt} \\17x=17\\ \ \ \\ x=1, \text{etc....} $$

Matrices: Gauss-Jordan Elimination

If you have studied a little bit of matrix algebra, then that last method should look familiar. The "Row Operations" method is exactly the same idea as Gauss-Jordan Elimination on an augmented matrix.

Gauss-Jordan Elimination is significantly more abstract than the previous methods because the variables $x$ and $y$ no longer appear in the problem itself. However, all of the coefficents are still there, and that is what matters. The objective in this case is to get the matrix into Reduced-Row Echelon Form. Here is a quick demonstration:

$$\text{Start:}\ \left(\begin{array}{cc|c}7&2&5\\ 3 & -4 & 7 \end{array}\right)\\ \text{Top Row x2:} \ \left(\begin{array}{cc|c} 14 & 4 & 10 \\ 3 & -4 & 7 \end{array}\right)\\ \text{Add Top to Bottom:} \ \left(\begin{array}{cc|c} 14 & 4 & 10 \\ 17 & 0 & 17 \end{array}\right)\\ \text{Bottom Row $\div$17:} \ \left(\begin{array}{cc|c}14 & 4 & 10 \\ 1 & 0 & 1 \end{array}\right)\\ \text{Bottom Row x14:} \ \left(\begin{array}{cc|c}14 & 4 & 10 \\ 14 & 0 & 14 \end{array}\right)\\ \text{Subtract Bottom from Top:} \ \left(\begin{array}{cc|c} 0 & 4 & -4 \\ 14 & 0 & 14 \end{array}\right)\\ \text{Top Row $\div$4:} \ \left(\begin{array}{cc|c} 0 & 1 & -1 \\ 14 & 0 & 14 \end{array}\right)\\ \text{Bottom Row $\div$14:} \ \left(\begin{array}{cc|c} 0 & 1 & -1 \\ 1 & 0 & 1 \end{array}\right)\\ \text{Switch Rows:} \ \left(\begin{array}{cc|c} 1 & 0 & 1 \\ 0 & 1 & -1 \end{array}\right) $$

Matrices: By Inversion

Now that we can see some connection between matrices and linear systems of equations, we might naturally ask how to represent this problem as a matrix equation and whether that matrix equation can be easily solved.

First, let's set up the matrix equation:

$$ \textbf{A}\vec{x}=\vec{b}\\ \ \ \\ \text{Let, } \textbf{A}=\left(\begin{array}{cc} 7 & 2 \\ 3 & -4 \end{array}\right), \ \vec{x}=\left(\begin{array}{c} x \\ y \end{array}\right), \ \text{and} \ \vec{b}=\left(\begin{array}{c} 5 \\ 7 \end{array}\right)\\ \ \ \\ \therefore \ \left(\begin{array}{cc} 7 & 2 \\ 3 & -4 \end{array}\right) \left(\begin{array}{c} x \\ y \end{array}\right) = \left(\begin{array}{c} 5 \\ 7 \end{array}\right) $$

From here, it is obvious that performing matrix multiplication between the matrix and vector on the left-hand side of the equation results in the original system of equations from the very beginning:

$$\left(\begin{array}{cc} 7 & 2 \\ 3 & -4 \end{array}\right) \left(\begin{array}{c} x \\ y \end{array}\right) = \left(\begin{array}{c} 5 \\ 7 \end{array}\right)\\ \left(\begin{array}{c} 7x+2y \\ 3x-4y \end{array}\right) = \left(\begin{array}{c} 5 \\ 7 \end{array}\right) $$

Therefore, one can see that a vector is just an ordered pair turned on its side. The top entry is the $x$-coordinate while the bottom entry is the $y$-coordinate. Thus, our goal in this problem is to determine the components of the unknown vector $\vec{x}=\left(\begin{array}{c} x \\ y \end{array}\right)$. To do that, we must isolate $\vec{x}$ (just like we would if it were a normal variable rather than a variable vector).

Isolating $\vec{x}$ means finding the inverse of $\textbf{A}$. The inverse of a matrix is unique, so I will write down the general form of the inverse for a 2x2 matrix and that will suffice to cover our example.

$$\textbf{A}=\left(\begin{array}{cc} a & b \\ c & d \end{array}\right)\\ \textbf{A}^{-1}=\frac{1}{ad-bc}\left(\begin{array}{cc} d & -b \\ -c & a \end{array}\right)\\ $$

I recommend checking for yourself to see that

$$\textbf{A}\textbf{A}^{-1}=\textbf{A}^{-1}\textbf{A}=\textbf{I}\\ \ \ \\ \frac{1}{ad-bc}\left(\begin{array}{cc} d & -b \\ -c & a \end{array}\right)\left(\begin{array}{cc} a & b \\ c & d \end{array}\right)=\left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right) $$

For convenience (and because it has many other applications), we define $ad-bc$ to be the "determinant of $\textbf{A}$." In this situation, it is relevant because it is the factor by which the inverse matrix must be divided so as to return the identity matrix when multiplied by $\textbf{A}$.

Now, finally, to solve our problem we multiple the original equation by the inverse matrix and we are done.

$$\textbf{A}\vec{x}=\vec{b}\\ \ \ \\ \textbf{A}^{-1}\textbf{A}\vec{x}=\textbf{A}^{-1}\vec{b}\\ \ \ \\ \textbf{I} \vec{x}=\textbf{A}^{-1}\vec{b}\\ \ \ \\ \left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right) \left(\begin{array}{c} x \\ y \end{array}\right)=-\frac{1}{34}\left(\begin{array}{cc} -4 & -2 \\ -3 & 7 \end{array}\right)\left(\begin{array}{c} 5 \\ 7 \end{array}\right)\\ \ \ \\ \left(\begin{array}{c} x \\ y \end{array}\right)=\left(\begin{array}{c} 1 \\ -1 \end{array}\right) $$

As you can see, the algebra is now much more complicated than the basic methods that you learned in grade school. However, this trade-off comes with the advantage of being a much more elegant-looking solution.

But Why?

There are two main reasons for using matrix algebra to solve linear systems of equations. First, the theory of matrices is very broad. It generalizes specific problems into much larger classes of problem types. By generalizing in this way, we can identify similarities between apparently unrelated problems and, therefore, relate their solutions to one another. Second, computers are really good with matrices. If a problem can be solved with a matrix, then a computer can create an approximate answer very, very quickly. And, guess what, many of the world's most pressing problems can be modeled with matrices.


Matrix theory is linear algebra with the method of the coordinate systems.

As to why the determinant is calculated that way try to compute the area of a square of unitary length side once it is transformed by a matrix (considering two adjacent sides as vectors). Determinant is an operation that can be applied to any linear operator $L: A\rightarrow A$, $A$ is a linear space over a field, and it gives an element of such a field and has such a geometrical interpretation that I asked you to search for.

"Taking the inverse" is again an operation that can be applied to any linear operator that has a non-null determinant.

Adjoint: Given a square matrix the sum of the product of the elements of a row (column) and the corresponding cofactors equals the determinant, while the sum of the product of the elements of a row (column) and the corresponding cofactors of the elements of another row (column) is null. That means that:$$A \operatorname{adj}(A)=\det A$$ where with $\det A$ I mean a diagonal matrix whose elements are all equal to the determinant.

Transpose is an operation that can be applied to any linear application $L: A\rightarrow B$, where $A$ and $B$ are linear space over the same field $\Bbb{K}$: it gives another linear application that describes how scalar linear functional on $B$, $f:B\rightarrow \Bbb{K}$, are mapped into scalar linear functional on $A$, $g:A\rightarrow \Bbb{K}$, as a composition of $f$ with $L$: $g=f\circ L$. In this way instead of performing such computation $f(L(x))$ one can simply do $g(x)$, for any $x\in A$

Matrices in addition to linear application can also be used to describe bilinear and quadratic forms in given coordinate systems.

Vectors are the element of a linear space. Coordinate vectors are their representation once a linear coordinate system has been chosen for such a linear space. With a linear coordinate system a linear space $A$ over a field $\Bbb{K}$ becomes an homomorphic image of a (coordinate) linear space $\Bbb{K}^n$, $n\in\Bbb{N}$ where $n\ge dim_\Bbb{K}A$. If the linear coordinate system is bijective the equality holds. So coordinate vectors are elements of the coordinate linear space. What you have being using till now are coordinate vectors even though you have been calling them simply vectors. By extension usually the numerical representation of a vector is called coordinate vector also when the coordinate system chosen is not linear, but in this case they cannot be used with the algorithm of the matrix theory. Coordinate vectors are also used to represent numerically linear functional once a linear coordinate system is chosen: those are usually represented as row vectors, while the former are represented as column vectors.