Why Is $\sqrt{\det(A^TA)}$ A Volume / Volume Factor?
The determinant of an $n\times n$ matrix is a volume / volume factor. So far, I'm good in my understanding. You take a linear map, encode it as a matrix, compute the volume of the parallelepiped (or whatever the proper name is) spanned by the column vectors, and look at the factor by which this transformation scaled the unit $n$-dimensional volume from before the transformation to the new one. That scaling is the determinant. There are many ways to view the determinant, but this is the most interesting to me, because I can visualize it.
Now, what if I have a transformation from $\mathbb{R}^n$ to $\mathbb{R^m}$, encode it by an $m\times n$ matrix, and want the $n$-dimensional volume of the parallelepiped spanned by the column vectors of my matrix? This is a well-grounded question (think of a 2-d parallellogram embedded arbitrarily in 3-space: what is it's area?), but pretty much never addressed in linear algebra courses / books. Apparently (check e.g. the Wikipedia entry for determinants) I'm supposed to compute $\sqrt{\det(A^TA)}$ now.
This makes sense in the $m=n$ scenario (except that orientation changes might be lost due to the square root?), since $|\det(A)|=\sqrt{\det(A)^2}=\sqrt{\det(A^T)\det(A)}=\sqrt{\det(A^TA)}$, but I just can't visualize it in the case $n\neq m$. I see that the end result of the product $A^TA$ is an $n\times n$ matrix, so clearly the determinant is then an n-dimensional volume / volume factor, but I can't see why I get the correct volume. Any help?
Let $A$ be the matrix in question, writing $D(A) = \sqrt{\det(A^T A)}$ and $V(A)$ for the desired $n$-volume; we want to see that $D(A) = V(A)$. Note that the claim is easy if $m < n$, since the rank of $A^T A$ is bounded above by the minimum of rank $A$ and rank $A^T$, so we assume $m \ge n$.
Choose orthogonal matrices $P$ and $Q$ of sizes $m \times m$ and $n \times n$. Now $$D(PAQ) = \sqrt{\det((PAQ)^T (PAQ))} = \sqrt{\det(Q^T A^TA Q)} = \sqrt{\det(A^T A)} = D(A).$$ I claim that $V(PA) = V(A)$. Indeed, write $$A = (v_1 \; v_2 \; \dots \; v_n)$$ where the $v_i$ are columns. It is now possible to choose $v_i$ for $n+1 \le i \le m$ in such a way that $V(A) = |\det (v_1 \; v_2 \; \dots \; v_m)|$; in fact, take $v_{n+1}$ to be of unit length and orthogonal to the span of the columns of $A$, and then repeat. Now, since $P$ preserves inner products (and norms), $$V(A) = \left |\det (v_1 \; v_2 \; \dots \; v_m)\right| = \left|\det (Pv_1 \; Pv_2 \; \dots \; Pv_m)\right| = V(Pv_1 \; Pv_2 \; \dots \; Pv_n) = V(PA).$$ Now write $A = URW$ using $SVD$, where $U, W$ are orthogonal and $R$ is rectangular diagonal with non-negative entries. Then $$V(A) = V(U^T A) = V(RW) = V(R).$$ The last equality follows since $V(W) = 1$, and if the diagonal entries of $R$ are $d_1, \dots, d_n$, then the column vectors of $RW$ are $d_1$ times longer than $W$'s in the 1st dimension, $d_2$ times longer in the 2nd dimension, etc. Thus $V(A) = V(R) = D(R) = D(URW) = D(A)$, where $V(R) = D(R)$ comes down to checking the claim for $n$-cells with edges in the axes.
Similarly to the axiomatic definition of the determinant, you might check that the provided formula satisfies linear homogeneity (for positive scaling of each column, because unlike the determinant it gives us non-oriented volume) shear invariance (adding one column to another doesn't change the volume) and is normed (hypercubes have volume 1).
Most textbooks on linear algebra show that these properties suffice to determine a unique function, which then must be the (non)oriented volume, since the (non)oriented volume certainly satisfies all the above axioms