dropping injectivity from multivariable change of variables

The change of variables for multivariable integration in Euclidean space is almost always stated for a $C^1$ diffeomorphism $\phi$, giving the familiar equation (for continuous $f$, say)

$$\boxed{\int_{\phi(U)}f=\int_U(f\circ\phi)\cdot|\det D\phi|}$$

Of course, this result by itself is not very useful in practice because a diffeomorphism is usually hard to come by. The better advanced calculus and multivariable analysis texts explain explicitly how the hypothesis that $\phi$ is injective with $\det D\phi\neq0$ can be relaxed to handle problems along sets of measure zero -- a result which is necessary for almost all practical applications of the theorem, starting with polar coordinates.

Despite offering this slight generalization, very few of the standard texts state that the situation can be improved further still: there is an analogous theorem for arbitrary $C^1$ mappings $\phi$, not just those that are injective everywhere except on a set of measure zero. We simply account for how many times a point in the image gets hit by $\phi$, giving

$$\boxed{\int_{\phi(U)}f\cdot\,\text{card}(\phi^{-1})=\int_U(f\circ\phi)\cdot|\det D\phi|}$$

where $\text{card}(\phi^{-1})$ measures the cardinality of $\phi^{-1}(x)$.

I think this theorem is a lot more natural and satisfying than the first, for many reasons. For one thing, it removes a huge restriction, bringing the theorem closer to the standard one-variable change of variables for which injectivity is not required (though of course the one-variable theorem is really a theorem about differential forms). It emphasizes that a certain degree of regularity is what's important here, not injectivity. For another thing, it's not a big step from here to degree theory for smooth maps between closed manifolds or to the "area formula" in geometric measure theory. (Indeed, the factor $\text{card}(\phi^{-1})$ is a special case of what old references in geometric measure theory called the "multiplicity function" or the "Banach indicatrix.") It's also used in multivariate probability to write down densities of non-injective transformations of random variables. And last, it's in the spirit of modern approaches to at least gesture at the most general possible result. The traditional statement is really just a special case; injectivity only becomes essential when we define the integral over a manifold (rather than a parametrized manifold), which we want to be independent of parametrization. I think teaching the more general result would greatly clarify these matters, which are a constant source of confusion to beginners.

Yet many otherwise excellent multivariable analysis texts (Spivak, Rudin PMA and RCA, Folland, Loomis/Sternberg, Munkres, Duistermaat/Kolk, Burkill) don't mention this result, even in passing, as far as I can tell. I've had to hunt for discussions of it, and I've found it here:

Zorich, Mathematical Analysis II (page 150, exercise 9, for the Riemann integral)
Kuttler, Modern Analysis (page 258, for the Lebesgue integral)
Csikós, Differential Geometry (page 72, for the Lebesgue integral)
Ciarlet, Linear and Nonlinear Functional Analysis with Applications (page 34, for the Lebesgue integral)
Bogachev, Measure Theory I (page 381, for the Lebesgue integral)
the Planet Math page on multivariable change of variables (Theorem 2)

I'm also confident I've seen it in some multivariable probability books, but I can't remember which. But none of these is a standard textbook, except perhaps for Zorich.

My question: are there standard references with nice discussions of this extension of the more familiar result? Probability references are fine, but I'm especially curious whether I've missed some definitive treatment in one of the classic analysis texts.

(Also feel free to speculate why so few texts mention it.)

The short answer is: When designing a course (and in the sequel, a textbook) you have to cover a lot of indispensable material, like "A continuous function on a compact set is uniformly continuous". But you also have to make hundreds of larger or smaller decisions about, e.g., the order of presentation, which "equally important" topics to include, which topics to sacrifice, or to "remove to the exercises", etc.

Concerning the change of variables formula: We absolutely need this formula for the computation of volumes, moments of inertia, heat content, etc., of "geometrically complicated", or else particularly symmetric bodies $B$. To this end an essentially injective parametrization of $B$ is completely sufficient. On the other hand the proof of this formula (even in the vanilla variant) is quite time consuming. Unfortunately its essential part, namely the geometric meaning of the determinant, tends to be obscured by the work necessary to effectively nullify measure zero effects. In one of the sources quoted above it is claimed that the general version of the formula (as well as its proof) includes a special case of Sard’s Theorem. The latter is definitely out of bounds for a first real analysis course.

It is forgiveable when we then leave it at that and just teach what the student will certainly need to handle standard arguments and situations in differential geometry, mathematical physics, and the like. In my own mathematical practice I have used the vanilla variant of the formula a thousand of times, but the more general formula involving the "covering number" maybe five times, e.g., in a course on integral geometry. Note that, if you have understood the vanilla variant, the general formula is intuitively obvious, so that you can work with it in probability theory or dynamical systems without much ado.

Dividing a square into equal-area rectangles

Triangles packed into a unit circle

Blocking directed paths on a DAG with a linear number of vertex defects.

The category of compact metric spaces

When is a group ring an integral domain

Placing the integers $\{1,2,\ldots,n\}$ on a circle ( for $n>1$) in some special order

Is the complement of an injective continuous map $\mathbb{R}\to \mathbb{R}^2$ with closed image necessarily disconnected?

Closure of $\left\{\frac{\tan{n}}{n} | \, n \in \mathbb{N}\right\}$

Good introductory books on homological algebra

Proof for why a matrix multiplied by its transpose is positive semidefinite

Proof of the unbounded-ness of $\sum_{n\geq 1}\frac{1}{n}\sin\frac{x}{n}$

Proving the inequality $\frac{\log (1)}{1!}+\frac{\log ^2(2)}{2!}+\frac{\log^3(3)}{3!}+\cdots> \frac{\pi }{4}$