How to show that $\det(AB) =\det(A) \det(B)$?

Given two square matrices $A$ and $B$, how do you show that $$\det(AB) = \det(A)\det(B)$$ where $\det(\cdot)$ is the determinant of the matrix?


Solution 1:

Let's consider the function $B\mapsto \det(AB)$ as a function of the columns of $B=\left(v_1|\cdots |v_i| \cdots | v_n\right)$. It is straight forward to verify that this map is multilinear, in the sense that $$\det\left(A\left(v_1|\cdots |v_i+av_i'| \cdots | v_n\right)\right)=\det\left(A\left(v_1|\cdots |v_i| \cdots | v_n\right)\right)+a\det\left(A\left(v_1|\cdots |v_i'| \cdots | v_n\right)\right).$$ It is also alternating, in the sense that if you swap two columns of $B$, you multiply your overall result by $-1$. These properties both follow directly from the corresponding properties for the function $A\mapsto \det(A)$.

The determinant is completely characterized by these two properties, and the fact that $\det(I)=1$. Moreover, any function that satisfies these two properties must be a multiple of the determinant. If you have not seen this fact, you should try to prove it. I don't know of a reference online, but I know it is contained in Bretscher's linear algebra book.

In any case, because of this fact, we must have that $\det(AB)=c\det(B)$ for some constant $c$, and setting $B=I$, we see that $c=\det(A)$.


For completeness, here is a proof of the necessary lemma that any a multilinear, alternating function is a multiple of determinant.

We will let $f:\mathbb (F^n)^n\to \mathbb F$ be a multilinear, alternating function, where, to allow for this proof to work in characteristic 2, we will say that a multilinear function is alternating if it is zero when two of its inputs are equal (this is equivalent to getting a sign when you swap two inputs everywhere except characteristic 2). Let $e_1, \ldots, e_n$ be the standard basis vectors. Then $f(e_{i_1},e_{i_2}, \ldots, e_{i_n})=0$ if any index occurs twice, and otherwise, if $\sigma\in S_n$ is a permutation, then $f(e_{\sigma(1)}, e_{\sigma(2)},\ldots, e_{\sigma(n)})=(-1)^\sigma$, the sign of the permutation $\sigma$.

Using multilinearity, one can expand out evaluating $f$ on a collection of vectors written in terms of the basis:

$$f\left(\sum_{j_1=1}^n a_{1j_1}e_{j_1}, \sum_{j_2=1}^n a_{2j_2}e_{j_2},\ldots, \sum_{j_n=1}^n a_{nj_n}e_{j_n}\right) = \sum_{j_1=1}^n\sum_{j_2=1}^n\cdots \sum_{j_n=1}^n \left(\prod_{k=1}^n a_{kj_k}\right)f(e_{j_1},e_{j_2},\ldots, e_{j_n}).$$

All the terms with $j_{\ell}=j_{\ell'}$ for some $\ell\neq \ell'$ will vanish before the $f$ term is zero, and the other terms can be written in terms of permutations. If $j_{\ell}\neq j_{\ell'}$ for any $\ell\neq \ell'$, then there is a unique permutation $\sigma$ with $j_k=\sigma(k)$ for every $k$. This yields:

$$\begin{align}\sum_{j_1=1}^n\sum_{j_2=1}^n\cdots \sum_{j_n=1}^n \left(\prod_{k=1}^n a_{kj_k}\right)f(e_{j_1},e_{j_2},\ldots, e_{j_n}) &= \sum_{\sigma\in S_n} \left(\prod_{k=1}^n a_{k\sigma(k)}\right)f(e_{\sigma(1)},e_{\sigma(2)},\ldots, e_{\sigma(n)}) \\ &= \sum_{\sigma\in S_n} (-1)^{\sigma}\left(\prod_{k=1}^n a_{k\sigma(k)}\right)f(e_{1},e_{2},\ldots, e_{n}) \\ &= f(e_{1},e_{2},\ldots, e_{n}) \sum_{\sigma\in S_n} (-1)^{\sigma}\left(\prod_{k=1}^n a_{k\sigma(k)}\right). \end{align} $$

In the last line, the thing still in the sum is the determinant, although one does not need to realize this fact, as we have shown that $f$ is completely determined by $f(e_1,\ldots, e_n)$, and we simply define $\det$ to be such a function with $\det(e_1,\ldots, e_n)=1$.

Solution 2:

The proof using elementary matrices can be found e.g. on proofwiki. It's basically the same proof as given in Jyrki Lahtonen 's comment and Chandrasekhar's link.

There is also a proof using block matrices, I googled a bit and I was only able to find it in this book and this paper.


I like the approach which I learned from Sheldon Axler's Linear Algebra Done Right, Theorem 10.31. Let me try to reproduce the proof here.

We will use several results in the proof, one of them is - as far as I can say - a little less known. It is the theorem which says, that if I have two matrices $A$ and $B$, which only differ in $k$-th row and other rows are the same, and the matrix $C$ has as the $k$-th row the sum of $k$-th rows of $A$ and $B$ and other rows are the same as in $A$ and $B$, then $|C|=|B|+|A|$.

Geometrically, this corresponds to adding two parallelepipeds with the same base.


Proof. Let us denote the rows of $A$ by $\vec\alpha_1,\ldots,\vec\alpha_n$. Thus $$A= \begin{pmatrix} a_{11} & a_{12}& \ldots & a_{1n}\\ a_{21} & a_{22}& \ldots & a_{2n}\\ \vdots & \vdots& \ddots & \vdots \\ a_{n1} & a_{n2}& \ldots & a_{nn} \end{pmatrix}= \begin{pmatrix} \vec\alpha_1 \\ \vec\alpha_2 \\ \vdots \\ \vec\alpha_n \end{pmatrix}$$

Directly from the definition of matrix product we can see that the rows of $A\cdot B$ are of the form $\vec\alpha_kB$, i.e., $$A\cdot B=\begin{pmatrix} \vec\alpha_1B \\ \vec\alpha_2B \\ \vdots \\ \vec\alpha_nB \end{pmatrix}$$ Since $\vec\alpha_k=\sum_{i=1}^n a_{ki}\vec e_i$, we can rewrite this equality as $$A\cdot B=\begin{pmatrix} \sum_{i_1=1}^n a_{1i_1}\vec e_{i_1} B\\ \vdots\\ \sum_{i_n=1}^n a_{ni_n}\vec e_{i_n} B \end{pmatrix}$$ Using the theorem on the sum of determinants multiple times we get $$ |{A\cdot B}|= \sum_{i_1=1}^n a_{1i_1} \begin{vmatrix} \vec e_{i_1}B\\ \sum_{i_2=1}^n a_{2i_2}\vec e_{i_2} B\\ \vdots\\ \sum_{i_n=1}^n a_{ni_n}\vec e_{i_n} B \end{vmatrix}= \ldots = \sum_{i_1=1}^n \ldots \sum_{i_n=1}^n a_{1i_1} a_{2i_2} \dots a_{ni_n} \begin{vmatrix} \vec e_{i_1} B \\ \vec e_{i_2} B \\ \vdots \\ \vec e_{i_n} B \end{vmatrix} $$

Now notice that if $i_j=i_k$ for some $j\ne k$, then the corresponding determinant in the above sum is zero (it has two identical rows). Thus the only nonzero summands are those one, for which the $n$-tuple $(i_1,i_2,\dots,i_n)$ represents a permutation of the numbers $1,\ldots,n$. Thus we get $$|{A\cdot B}|=\sum_{\varphi\in S_n} a_{1\varphi(1)} a_{2\varphi(2)} \dots a_{n\varphi(n)} \begin{vmatrix} \vec e_{\varphi(1)} B \\ \vec e_{\varphi(2)} B \\ \vdots \\ \vec e_{\varphi(n)} B \end{vmatrix}$$ (Here $S_n$ denotes the set of all permutations of $\{1,2,\dots,n\}$.) The matrix on the RHS of the above equality is the matrix $B$ with permuted rows. Using several transpositions of rows we can get the matrix $B$. We will show that this can be done using $i(\varphi)$ transpositions, where $i(\varphi)$ denotes the number of inversions of $\varphi$. Using this fact we get $$|{A\cdot B}|=\sum_{\varphi\in S_n} a_{1\varphi(1)} a_{2\varphi(2)} \dots a_{n\varphi(n)} (-1)^{i(\varphi)} |{B}| =|A|\cdot |B|.$$

It remains to show that we need $i(\varphi)$ transpositions. We can transform the "permuted matrix" to matrix $B$ as follows: we first move the first row of $B$ on the first place by exchanging it with the preceding row until it is on the correct position. (If it already is in the first position, we make no exchanges at all.) The number of transpositions we have used is exactly the number of inversions of $\varphi$ that contains the number 1. Now we can move the second row to the second place in the same way. We will use the same number of transposition as the number of inversions of $\varphi$ containing 2 but not containing 1. (Since the first row is already in place.) We continue in the same way. We see that by using this procedure we obtain the matrix $B$ after $i(\varphi)$ row transpositions.

Solution 3:

Let $K$ be the ground ring. The statement holds

(a) when $B$ is diagonal,

(b) when $B$ is strictly triangular,

(c) when $B$ is triangular (by (a) and (b)),

(d) when $A$ and $B$ have rational entries and $K$ is an extension of $\mathbb Q$ containing the eigenvalues of $B$ (by (c)),

(e) when $K=\mathbb Q$ (by (d)),

(f) when $K=\mathbb Z[a_{11},\dots,a_{nn},b_{11},\dots,b_{nn}]$, where the $a_{ij}$ and $b_{ij}$ are respectively the entries of $A$ and $B$, and are indeterminate (by (e)),

(g) always (by (f)).

The reader who knows what the discriminant of a polynomial in $\mathbb Q[X]$ is, can skip (b) and (c).

Reference: this MathOverflow answer of Bill Dubuque.

EDIT 1. The principle underlying the above argument has various names. Bill Dubuque calls it "universality" principle. Michael Artin calls it "The Principle of Permanence of Identities". The section of Algebra with this title can be viewed here. I strongly suggest reading this section to those who are not familiar with this. It is an interesting coincidence that the illustration chosen by Artin is precisely the multiplicativity of determinants.

Another highly important application is the proof of the Cayley-Hamilton Theorem. I will not give it here, but I will digress on another point. That is, I will try to explain why

(*) it suffices to prove Cayley-Hamilton or the multiplicativity of determinants in the diagonal case.

Suppose we have a polynomial map $f:M_n(\mathbb Z)\to\mathbb Z$. Then $f$ is given by a unique element, again denoted $f$, of $\mathbb Z[a_{11},\dots,a_{nn}]$, where the $a_{ij}$ are indeterminates (because $\mathbb Z$ is an infinite domain). As a result, given any $A$ in $M_n(K)$ for any commutative ring $K$, we can define $f_K(A)$ by mapping the indeterminate $a_{ij}$ to the corresponding entry of $A$. That is the Principle of Permanence of Identities. The key to prove (*) will be:

LEMMA 1. Let $f:M_n(\mathbb Z)\to\mathbb Z$ be a polynomial map vanishing on the diagonalizable matrices. Then $f$ vanishes on all matrices.

There are at least two ways to prove this. The reader will perhaps prefer the first one but, (IMHO) the second one is better.

First way: It suffices to prove that the polynomial map $f_{\mathbb C}:M_n(\mathbb C)\to\mathbb C$ is zero. Thus it suffices to prove that the diagonalizable matrices are dense in $M_n(\mathbb C)$. But this is clear since any $A\in M_n(\mathbb C)$ is similar to a triangular matrix $T$, and the diagonal entries of $T$ (which are the eigenvalues of $A$) can be made all distinct by adding an arbitrarily small diagonal matrix.

Second way. Consider again the ring $R:=\mathbb Z[a_{11},\dots,a_{nn}]$, where the $a_{ij}$ are indeterminates. Let $A$ in $M_n(R)$ be the matrix whose $(i,j)$ entry is $a_{ij}$. Let $\chi\in R[X]$ be the characteristic polynomial of $A$, and let $u_1,\dots,u_n$ be the roots of $\chi$ (in some extension of the fraction field of $R$).

LEMMA 2. The expression $$\prod_{i < j}\ (u_i-u_j)^2$$ defines a unique nonzero element of $d\in R$, called the discriminant of $\chi$.

Lemma 2 implies Lemma 1 because $R$ is a domain and because we have $fd=0$ since $f$ vanishes on the diagonalizable matrices, whereas $d$ vanishes on the non-diagonalizable matrices.

Lemma 2 is a particular case of a theorem which says that, given any monic polynomial $g$ in one indeterminate and coefficients in a field, any polynomial in the roots of $g$ which is invariant under permutation is a polynomial in the coefficients of $g$. More precisely:

Let $A$ be a commutative ring, let $X_1,\dots,X_n,T$ be indeterminates, and let $s_i$ be the degree $i$ elementary symmetric polynomial in $X_1,\dots,X_n$. Recall that the $s_i$ are defined by $$ f(T):=(T-X_1)\cdots(T-X_n)=T^n+\sum_{i=1}^n\ (-1)^i\ s_i\ T^{n-i}. $$ We abbreviate $X_1,\dots,X_n$ by $X_\bullet$, and $s_1,\dots,s_n$ by $s_\bullet$. Let $G$ the group of permutations of the $X_i$, and $A[X_\bullet]^G\subset A[X_\bullet]$ the fixed ring. For $\alpha\in\mathbb N^n$ put $$ X^\alpha:=X_1^{\alpha_1}\cdots X_1^{\alpha_1},\quad s^\alpha:=s_1^{\alpha_1}\cdots s_1^{\alpha_1}. $$ Write $\Gamma$ for the set of those $\alpha\in\mathbb N^n$ which satisfy $\alpha_i<i$ for all $i$, and put $$ X^\Gamma:=\{X^\alpha\ |\ \alpha\in\Gamma\}. $$

FUNDAMENTAL THEOREM OF SYMMETRIC POLYNOMIALS. The $s_i$ generate the $A$-algebra $A[X_\bullet]^G$.

PROOF. Observe that the map $u:\mathbb N^n\to\mathbb N^n$ defined by $$ u(\alpha)_i:=\alpha_i+\cdots+\alpha_n $$ is injective. Order $\mathbb N^n$ lexicographically, note that the leading term of $s^\alpha$ is $X^{u(\alpha)}$, and argue by induction on the lexicographical ordering of $\mathbb N^n$.

EDIT 2.

Polynomial Identities

Michael Artin writes:

It is possible to formalize the above discussion and to prove a precise theorem concerning the validity of identities in an arbitrary ring. However, even mathematicians occasionally feel that it isn't worthwhile making a precise formulation---that it is easier to consider each case as it comes along. This is one of those occasions.

I'll disobey and make a precise formulation (taken from Bourbaki). If $A$ is a commutative ring and $T_1,\dots,T_k$ are indeterminates, let us denote the obvious morphism form $\mathbb Z[T_1,\dots,T_k]$ to $A[T_1,\dots,T_k]$ by $f\mapsto\overline f$.

Let $X_1,\dots,X_m,Y_1,\dots,Y_n$ be indeterminates.

Let $f_1,\dots,f_n$ be in $\mathbb Z[X_1,\dots,X_m]$.

Let $g$ be in $\mathbb Z[Y_1,\dots,Y_n]$.

The expression $g(f_1,\dots,f_n)$ denotes then a well-defined polynomial in $\mathbb Z[X_1,\dots,X_m]$.

If this polynomial is the zero polynomial, say that $(f_1,\dots,f_n,g)$ is an $(m,n)$-polynomial identity.

The "theorem" is this:

If $(f_1,\dots,f_n,g)$ is an $(m,n)$-polynomial identity, and if $x_1,\dots,x_m$ are in $A$, where $A$ is any commutative ring, then $$g(f_1(x_1,\dots,x_m),\dots,f_n(x_1,\dots,x_m))=0.$$

Exercise: Is $$(X_1^3-X_2^3,X_1-X_2,X_1^2+X_1X_2+X_2^2,Y_1-Y_2Y_3)$$ a $(2,3)$-polynomial identity?

Clearly, the multiplicativity of determinants and the Cayley-Hamilton can be expressed in terms of polynomial identities in the above sense.

Exterior Algebras

To prove the multiplicativity of determinants, one can also proceed as follows.

Let $A$ be a commutative ring and $M$ an $A$-module. One can show that there is an $A$-algebra $\wedge(M)$, called the exterior algebra of $M$ [here "algebra" means "not necessarily commutative algebra"], and an $A$-linear map $e_M$ from $M$ to $\wedge(M)$ having the following property:

For every $A$-linear map $f$ from $M$ to an $A$-algebra $B$ satisfying $f(x)^2=0$ for all $x$ in $M$, there is a unique $A$-algebra morphism $F$ from $\wedge(M)$ to $B$ such that $F\circ e_M=f$.

One can prove $e_M(x)^2=0$ for all $x$ in $M$. This easily implies that $\wedge$ is a functor from $A$-modules to $A$-algebras.

Let $\wedge^n(M)$ be the submodule of $\wedge(M)$ generated by the $e_M(x_1)\cdots e_M(x_n)$, where the $x_i$ run over $M$. Then $\wedge^n$ is a functor from $A$-modules to $A$-modules.

One can show that the $A$-module $\wedge^n(A^n)$ is isomorphic to $A$. For any endomorphism $f$ of $A^n$, one defines $\det(f)$ as being $\wedge^n(f)$. The multiplicativity is then obvious.

Solution 4:

There are a lot of answers already posted, but I like this one based on the permutations-based definition of the determinant. It's a definition that is equivalent to other definitions, and depending on your book/background, you can prove the equivalence yourself. For an $n\times n$ matrix $A$, define $\det(A)$ by:

\begin{align*} \det(A) & = \sum_{\sigma\in S_n}(-1)^{\sigma}\prod_{i=1}^nA_{i,\sigma(i)} \end{align*}

where

  • $S_n$ is the permutation group on $n$ objects
  • $(-1)^{\sigma}$ is $1$ when $\sigma$ is an even permutation and $-1$ for an odd permutation.

Just apply this to $2\times2$ and $3\times3$ matrices, and you will get familiar formulas.

Now the proof below is a lot of symbol pushing and reindexing, and then a big subset of terms that are grouped together in the right way are seen to sum to zero. I would generally prefer one of the more geometric proofs already offered for this specific question. But at the same time, as an algebraist, I like to raise awareness of the permutations-based definition.

\begin{align*} \det(AB) & = \sum_{\sigma\in S_n}(-1)^\sigma\prod_{l=1}^n(AB)_{l,\sigma(l)}\\ & = \sum_{\sigma\in S_n}(-1)^\sigma\prod_{l=1}^n\left(\sum_{k=1}^nA_{l,k}B_{k,\sigma(l)}\right) \end{align*}

We'd like to swap the inner sum and product. In general, $\prod_{l=1}^n\left(\sum_{k=1}^mc_{l,k}\right) = \sum_{\bar{k}}\left(\prod_{l=1}^nc_{l,k_l}\right)$, where the second sum is over all $\bar{k}=(k_1,k_2,\ldots,k_n)$ with each $k_l$ in $\left\{1,2,\ldots ,m\right\}$. Here we have a product of sums with $m=n$. Therefore,

\begin{align} \det(AB) & = \sum_{\sigma\in S_n}(-1)^\sigma\sum_{\bar{k}}\left(\prod_{l=1}^nA_{l,k_l}B_{k_l,\sigma(l)}\right)\\ & = \sum_{\bar{k}}\sum_{\sigma\in S_n}(-1)^\sigma\left(\prod_{l=1}^nA_{l,k_l}B_{k_l,\sigma(l)}\right) \\ \end{align}

At this point, there are two types of $\bar{k}$ to consider. Remember, each $\bar{k}$ is an $n$-tuple of integers between $1$ and $n$. Some $n$-tuples have repeated entries, and some don't. If $\bar{k}$ has no repeated entries, it defines a permutation $\tau:\{1,2,\ldots , n\}\to\{1,2,\ldots , n\}$ which sends each $l$ to $k_l$.

Suppose $\bar{k}$ has a repeated entry: $k_p=k_q$. Then we can pair up terms in the inner sum to cancel each other out. Specifically, pair up each $\sigma$ with $\sigma\cdot(p\;q)$, where $(p\;q)$ is the transposition that swaps position $p$ with $q$. The contribution of these two terms to the inner sum is

\begin{align*} & \phantom{{}={}}\pm\left(\left(\prod_{l=1}^nA_{l,k_l}B_{k_l,\sigma(l)}\right)-\left(\prod_{l=1}^nA_{l,k_l}B_{k_l,\sigma((p\;q)l)}\right)\right)\\ &= \pm\left(\left(\prod_{l=1}^nA_{l,k_l}\right)\left(\prod_{l=1}^nB_{k_l,\sigma(l)}\right)-\left(\prod_{l=1}^nA_{l,k_l}\right)\left(\prod_{l=1}^nB_{k_l,\sigma((p\;q)l)}\right)\right)\\ &= \pm\left(\left(\prod_{l=1}^nA_{l,k_l}\right)\left(\prod_{l=1}^nB_{k_l,\sigma(l)}\right)-\left(\prod_{l=1}^nA_{l,k_l}\right)\left(\prod_{l'=1}^nB_{k_{l'},\sigma(l'))}\right)\right) \end{align*}

where the final product has been reindexed with $l'=(p\;q)l$, and we have made use of the fact that $k_l=k_{l'}$ for all $l$. The overall difference is clearly zero. So in the earlier equation for $\det(AB)$, the only terms in the inner sum that need be considered are those where $\bar{k}$ defines a permutation $\tau$.

\begin{align*} \det(AB) & = \sum_{\tau\in S_n}\sum_{\sigma\in S_n}(-1)^\sigma\left(\prod_{l=1}^nA_{l,\tau(l)}B_{\tau(l),\sigma(l)}\right) \end{align*}

Reindexing the inner sum with $\sigma = \sigma'\tau$,

\begin{align*} \det(AB) & = \sum_{\tau\in S_n}\sum_{\sigma'\in S_n}(-1)^{\sigma'\tau}\left(\prod_{l=1}^nA_{l,\tau(l)}B_{\tau(l),\sigma'\tau(l)}\right) \\ & = \sum_{\tau\in S_n}\sum_{\sigma'\in S_n}(-1)^{\sigma'\tau}\left(\prod_{l=1}^nA_{l,\tau(l)}\right)\left(\prod_{l=1}^nB_{\tau(l),\sigma'\tau(l)}\right) \end{align*}

Reindexing the final product with $l'=\tau(l)$,

\begin{align*} & = \sum_{\tau\in S_n}\sum_{\sigma'\in S_n}(-1)^{\sigma'\tau}\left(\prod_{l=1}^nA_{l,\tau(l)}\right)\left(\prod_{l'=1}^nB_{l',\sigma'(l')}\right)\\ & = \left(\sum_{\tau\in S_n}(-1)^{\tau}\prod_{l=1}^nA_{l,\tau(l)}\right)\left(\sum_{\sigma'\in S_n}(-1)^{\sigma'}\prod_{l'=1}^nB_{l',\sigma'(l')}\right)\\ & = \det(A)\det(B) \end{align*}

Solution 5:

This isn't strictly an answer to the question because it is not a rigorous argument that $\det(AB)=\det(A)\det(B)$. But for me the idea I will share carries a lot of useful insight so I offer it in that spirit. It is based on the geometric interpretation of the determinant:

Interpreting $A$ as a linear transformation of $n$-dimensional space, $\det(A)$ is the effect of $A$ on $n$-volumes. More precisely, if a set $S$ has $n$-dimensional measure $k$, then the image set $A(S)$ has $n$-dimensional measure $\left|\det(A)\right|k$, i.e. $\left|\det(A)\right|$ times as big. The sign of $\det(A)$ tells you whether $A$ preserves or reverses orientation.

Examples:

Let $n=2$ so we are dealing with areas in the plane.

If $A$ is a rotation matrix, then its effect on the plane is a rotation. $\det(A)$ is positive 1 because $A$ actually preserves all areas (so absolute value 1) and preserves orientation (so positive).

If $A$ has the form $kI$, $k$ positive, then $\det(A)$ is $k^2$. This is because the geometric effect of $A$ is a dilation by a factor of $k$, so $A$'s effect on area is to multiply it by $k^2$.

If $A$ has $1, -1$ on the main diagonal and zero elsewhere, then it corresponds to reflection in the $x$-axis. Here the determinant is $-1$ because though $A$ preserves areas, it reverses the orientation of the plane.

Once you buy this interpretation of the determinant, $\det(AB)=\det(A)\det(B)$ follows immediately because the whole point of matrix multiplication is that $AB$ corresponds to the composed linear transformation $A \circ B$. Looking at the magnitudes and the signs separately: $A \circ B$ scales volumes by $\left|\det(B)\right|$ and then again by $\left|\det(A)\right|$, so in total by $\left|\det(A)\right|\left|\det(B)\right|=\left|\det(A)\det(B)\right|$. I'll let you think about the signs and orientations.

This argument becomes a rigorous proof via a proof of the geometric interpretation of the determinant. How to prove this would depend on what definition is being used for the determinant. If we defined the determinant as the effect of $A$ on $n$-volumes (in the sense above), we would skip the need for this step. (We'd still have to prove that for a linear transformation the effect on $n$-volume doesn't depend on the set; and to avoid circularity we'd need a way to define orientation that didn't depend on the determinant - in my experience many definitions of orientation do depend on it.) On the other hand, if we define the determinant in any of the more usual algebraic ways, we have something left to prove here. But I hope this way of looking at things is useful in any case.