What is the origin of the determinant in linear algebra?

Solution 1:

I normally have two ways of viewing determinants without appealing to higher-level math like multilinear forms.

The first is geometric, and I do think that most vector calculus classes nowadays should teach this interpretation. That is that, given vectors $v_1, \ldots, v_n \in \mathbb{R}^n$ dictating the sides of an $n$-dimensional parallelepiped, the volume of this parallelepiped is given by $\det(A)$, where $A = [v_1 \ldots v_n]$ is the matrix whose columns are given by those vectors. We can then view the determinant of a square matrix as measuring the volume-scaling property of the matrix as a linear map on $\mathbb{R}^n$. From here, it would be clear why $\det(A) = 0$ is equivalent to $A$ not being invertible - if $A$ takes a set with positive volume and sends it to a set with zero volume, then $A$ has some direction along which it "flattens" points, which would precisely be the null space of $A$. Unfortunately, I'm under the impression that this interpretation is at least semi-modern, but I think this is one of the cases where the modern viewpoint might be better to teach new students than the old viewpoint.

The old viewpoint is that the determinant is simply the result of trying to solve the linear system $Ax = b$ when $A$ is square. This is most likely how the determinant was first discovered. To derive the determinant this way, write down the generic matrix and then proceed by Gaussian elimination. This means you have to choose nonzero leading entries in each row (the pivots) and use them to eliminate subsequent entries below. Each time you eliminate the rows, you have to multiply by a common denominator, so after you do this $n$ times, you'll end up with the sum of all the permutations of entries from different rows and columns merely by virtue of having multiplied out to get common denominators. The $(-1)^k$ sign flip comes from the fact that at each stage in Gaussian elimination, you're subtracting. So on the first step you're subtracting, but on the second step you're subtracting a subtraction, and so forth. At the very end, by Gaussian elimination, you'll obtain an echelon form (upper triangular), and one knows that if any of the diagonal entries are zero, then the system is not uniquely solvable; the last diagonal entry will precisely be the determinant times the product of the values of previously used pivots (up to a sign, perhaps). Since the pivots chosen are always nonzero, then it will not affect whether or not the last entry is zero, and so you can divide them out.

EDIT: It isn't as simple as I thought, though it will work out if you keep track of what nonzero values you multiply your rows by in Gaussian elimination. My apologies if I mislead anyone.

Solution 2:

The determinant was originally `discovered' by Cramer when solving systems of linear equations necessary to determine the coefficients of a polynomial curve passing through a given set of points. Cramer's rule, for giving the general solution of a system of linear equations, was a direct result of this.

This appears in Gabriel Cramer, ``Introduction a l'analyse des lignes courbes algebriques,''(Introduction to the analysis of algebraic line curves), Geneve, Ches les Freres Cramer & Cl. Philibert, (1750). It is cited as a footnote on p. 60, which reads (from French):

``I think I have found [for solving these equations] a very simple and general rule, when the number of equations and unknowns do not pass the first degree [e.g. are linear]. One finds this in the Appendix No. 1.'' Appendix No. 1 appears on p. 657 of the same text. The text is available on line, for those who can read French.

The history of the Determinant appears in Thomas Muir, ``The Theory of Determinants in the Historical Order of Development,'' Dover, NY, (1923). This is also available on line.

Solution 3:

I do not know the actual history of determinant, but I think it is very well motivated. From the way I look at it, it's actually those properties of determinant that make sense. Then you derive the formula from them.

Let me start by trying to define the "signed volume" of a hyper-parallelepiped whose sides are $(u_1, u_2, \ldots, u_n)$. I'll call this function $\det$. (I have no idea why it is named "determinant". Wiki says Cauchy was the one who started using the term in the present sense.) Here are some observations regarding $\det$ that I consider quite natural:

The unit hypercube whose sides are $(e_1, e_2, \ldots, e_n)$, where $e_i$ are standard basis vectors of $\mathbb R^n$, should have volume of $1$.
If one of the sides is zero, the volume should be $0$.
If you vary one side and keep all other sides fix, how would the signed volume change? You may think about a 3D case when you have a flat parallelogram defined by vectors $u_1$ and $u_2$ as a base of a solid shape, then try to extend the "height" direction by the third vector $u_3$. What happens to the volume as you scale $u_3$? Also, consider what happens if you have two height vectors $u_3$ and $\hat u_3$. $\det(u_1, u_2, u_3 + \hat u_3)$ should be equal to $\det(u_1, u_2, u_3) + \det(u_1, u_2, \hat u_3)$. (This is where you need your volume function to be signed.)
If I add a multiple of one side, say $u_i$, to another side $u_j$ and replace $u_j$ by $\hat u_j = u_j + c u_i$, the signed volume should not change because the addition to $u_j$ is in the direction of $u_i$. (Think about how a rectangle can be sheered into a parallelogram with equal area.)

With these three properties, you get familiar properties of $\det$:

$\det(e_1, \ldots, e_n) = 1$.
$\det(u_1, \ldots, u_n) = 0$ if $u_i = 0$ for some $i$.
$\det(u_1, \ldots, u_i + c\hat u_i, \ldots, u_n) = \det(u_1, \ldots, u_i, \ldots, u_n) + c\det(u_1, \ldots, \hat u_i, \ldots, u_n)$.
$\det(u_1, \ldots, u_i, \ldots, u_j, \ldots, u_n) = \det(u_1, \ldots, u_1, \ldots, u_j + cu_i, \ldots, u_n)$. (It may happen that $j < i$.)

You can then derive the formula for $\det$. You can use these properties to deduce further easier-to-use (in my opinion) properties:

Swapping two columns changes the sign of $\det$.

This should tell you why oddness and evenness of permutations matter. To actually (inefficiently) compute the determinant $\det(u_1, u_2, \ldots, u_n)$, write $u_i$ as $u_i = \sum_{j=1}^n u_{ij}e_j$, and expand by multilinearity. For example, in 2D case,

$$ \begin{align*} \det(u, v) & = \det(u_1e_1 + u_2e_2, v_1e_1 + v_2e_2) \\ & = u_1v_1\underbrace{\det(e_1, e_1)}_0 + u_1v_2\underbrace{\det(e_1, e_2)}_1 + u_2v_1\underbrace{\det(e_2, e_1)}_{-1} + u_2v_2\underbrace{\det(e_2, e_2)}_0 \\ & = u_1v_2 - u_2v_1. \end{align*} $$

(If you are not familiar with multilinearity, just think of it as a product. Ignore the word $\det$ from the second line and you get a simple expansion of products. Then you evaluate "unusual product" between vectors $e_i$ by the definition of $\det$. Note, however, that the order is important, as $\det(u, v) = - \det(v, u)$.)

Is every function with the intermediate value property a derivative?

Why isn't lambda notation popular among mathematicians?

Yoneda-Lemma as generalization of Cayley`s theorem?

How do Lagrange multipliers work to find the lowest value of a function subject to a constraint?

Is there a differentiable function such that $f(\mathbb Q) \subseteq \mathbb Q$ but $f'(\mathbb Q) \not \subseteq \mathbb Q$?

What does it mean to take the gradient of a vector field?

Why does the Hilbert curve fill the whole square?

Have there been (successful) attempts to use something other than spheres for homotopy groups?

How many triangles

The expected outcome of a random game of chess?

Why do we require a topological space to be closed under finite intersection?

Is there a bounded function $f$ with $f'$ unbounded and $f''$ bounded?