Measure of Image of Linear Map
Hint 1) Enough to show this in the case that $A$ is an $n$-dimensional parallelopiped (as John M pointed out).
Hint 2) Recall from linear algebra that any linear mapping can be written as a composition of elementary linear mappings of three types: (usually expressed in the language of matrices, so I will do the same here) A) swap two rows, B) multiply a row by a scalar, C) add a scalar multiple of one row to another.
Hint 3) Swapping two coordinates is geometrically a reflection with respect to a hyperplane, so type A is easy. Type B amounts to stretching one of the coordinates. Type C is geometrically a shearing, i.e. the type of mapping that turns a rectangle into a parallelogram with same base and height.
I've taken a quick look at the proof given in the text that you reference. It largely follows Jyrki's approach, but with a small difference. The text (in part (v) of the proof) considers these type C shearing matrices, but with only a multiple of one, rather than a general multiple, and then refers to a rather specific theorem that allows for decomposition into elementary matrices, such that the elementary matrix with addition of one row to the next only requires a multiple of one. This theorem is stronger than the usual decomposition theorem, and I haven't been able to find a convenient reference for it.
Anyway, Jyrki's proof is nicer than your text's: There is no reason to restrict the multiple of your shearing matrix to one - the same argument goes through for any multiple. Once you allow for this general shearing matrix, you can then refer to the more standard proofs of decomposition into elementary matrices. I like Ch 1 of Artin's "Algebra" for this.
For another approach which might be quite illuminating, see Ch 5 of Lax's "Linear Algebra". He starts with the properties of what an operator for "signed volume" must look like, and then he deduces a formula which turns out to be the usual determinant.
The lemma that the question is concerned with is what the whole change of variables formula (a.k.a U-substitution in multivariables) is built on. Without proper understanding of it it is impossible to really appreciate any proof of the change of variables formula.
Here are some of my personal notes that I wrote as part of my trying to understand area (and coarea) formula. They may supplement some of the other answer/comments.
Let us settle down on the following definition the volume of parallelopiped. (I can understand if people challenge this as a definition, but it is at least intuitive and generalizes our 3D geometry.)
Case of $\det A =0$ is follows from the fact that $A(\mathbb{R}^n)$ is a subspace of dimension at most $n-1$, thus, every subset of it will have $n$-volume zero. So, assume $A$ is invertible in the following$.
Definition: Let $v_1,v_2,\cdots,v_n$ be vectors in $\mathbb{R}^n$, then the volume of the parallelopiped $P$ outlined by them is equal to $ V(P):= |\det [v_1 , v_2 , \cdots , v_n] | \ ,$ where $[v_1,\cdots,v_n]$ stands for the square matrix whose $i$'th column is $v_i$. (Absolute value taken to guarantee positivity.)
Easy fact: It follows from $\det(AA^t)=(\det A)^2$ that $V(P)=|\det [v_1 , \cdots , v_n] | = \sqrt{\det[\langle v_i,v_j\rangle]_{i,j}}$
Lemma 1: Let $A: \mathbb{R}^n \to \mathbb{R}^n$ be a linear map. Prove that $$ \frac{V(A(P))}{V(P)} = |\frac{\det [Av_1 , Av_2 , \cdots , Av_n]}{\det [v_1 , v_2 , \cdots , v_n]}| $$ is independent of the choice of a linearly independent set of vectors $v_1,v_2,\cdots,v_n$.
As far as I recall, the proof uses linear algebra facts about determinant and how it changes (or remains unchanged) under certain operations on rows/columns.
Corollary: By the choice of the standard unit vectors for $v_i$'s it follows that this common value equals $|\det A|$, i.e. for any (non-degenerate parallelopiped $P$, $$ \frac{V(A(P))}{V(P)} = |\det A| \ . $$ Proof: $Ae_i = i$'th column of $A$. So, $[Ae_1 , \cdots , Ae_n] = A$.
If we take a parallelopiped $P$ with "vertex" at a point different than origin, by linearity of $A$ the image will be just an affine shift of the corresponding parallelopiped at the origin. So, the results will still hold. So, no matter where inside $\mathbb{R}^n$ a parallelopiped is located, $A(P)$ has volume $|\det A|$ times that of $P$. This was a quite obvious observation but one that is needed to generalize the claim to measurable sets.
I am not going into precise details here, but measurable sets $S$ are those that can be "well estimated" by unions of cubes. Therefore $A(S)$ is well approximated, up to desirable precision, by images of cubes. By facts above the volume of the image of the union of cubes is $\det A$ times the volume of the union of the cubes in the domain. Taking limit proves that the same holds for volume of $A(S)$ versus volume of $S$.