How can we think and/or write rigorously about integration by substitution?

Define a function $I:\mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}$ as follows.

$$I(a,b)=\int_a^b \sin t \cos t \,d t$$

Then we can find a more explicit description of $I$ using integration by substitution. So let $u = \sin t$. Then $d u = \cos t \,dt$. Therefore:

$$I(a,b) = \int_a^b \sin t \cos t \,d t = \int_{t=a}^{t=b}udu = \left[\frac{1}{2}u^2\right]_{t=a}^{t=b} = \left[\frac{1}{2} \sin^2 t\right]_{t=a}^{t=b} = \frac{1}{2}\left(\sin^2b-\sin^2 a\right)$$

I'm not confident in our final answer, though; there's just too many dodgy things going on. These include both general issues with integration by substitution, and issues that are somewhat more specific to this problem.

General Issues.

  • I've always been a bit uncomfortable with this "let $u=\sin t$" stuff, since we never said anything like "let $t$ denote a fixed but arbitrary real number," so the meaning of $t$ is ambiguous. This isn't easily fixed though; we don't want to say "let $t$ denote a fixed but arbitrary real number" because moments later, we're going to quantify over $t$ by integrating, so clearly it wasn't fixed.

  • Another general issue is that I don't really know what expressions like $du = \cos t dt$ mean. Under the usual semantics for equations, we would think of this as being true for some pairs $(u,t)$ and false for others. Here, that semantics doesn't work, so its not at all clear to me what is being asserted.

Particular Issues.

  • In this particular case, since the function $t \in [a,b] \mapsto \sin t \in \mathbb{R}$ isn't injective for a sufficiently large gap between $a$ and $b$, I'm not even sure we're allowed to perform integration by substitution here. (Are we?)

  • The notation $\int_{t=a}^{t=b}udu$ and $\left[\frac{1}{2}u^2\right]_{t=a}^{t=b}$ just kind of seems kind of ambiguous to me. Does this really make sense? If so, how does one formalize the meanings of these expressions?

Question. Suppose we want to conceptualize integration by substitution rigorously, and to apply it rigorously (using unambiguous notation) to find $I(a,b)$ explicitly. How can we do this?

Please don't post answers that cleverly avoid using integration by substitution. I want to understand it, not avoid it.


Solution 1:

From the perspective of an elementary calculus student (by which I mean that it can supposedly be made rigorous later on, but isn't in introductory classes), the $du$, $dx$ stuff is absolute nonsense and I will never understand why it continues to be used by so many professors. It really is the one glaring hole in most otherwise rigorous calculus courses. Really bizarre.

Anyway, the real story is told using composition of functions. Where $f$ is an integrable function on $[a, b]$, I'll denote $\int_a^b f$ the integral of $f$ over $[a, b]$, since it really is something determined by the function itself, there are no "variables" (whatever that could mean) anywhere.

Then we have:

$$\int^b_a(f\circ\phi)\phi'=\int^{\phi(b)}_{\phi(a)}f$$

There are appropriate assumptions that need to be made about $f$ and $\phi$, which are better explained on Wikipedia.

Thus for example let's say we want to integrate

$$\int^2_1\frac {2x} {1+x^2}$$

Well, defining $f(x)=\frac 1 x$ and $\phi=1+x^2$, the integrand is precisely $(f\circ\phi)\phi'$, therefore the integral is equal to

$$\int^{\phi(2)}_{\phi(1)}\frac 1 x=\int^{5}_{2}\frac 1 x$$

Note: I can say from personal experience that thinking with this approach is much slower than the "multiply both sides by $dx$" approach that most of your classmates will be using. I recommend practicing thinking with function composition alot until you can do it quickly and fluently.

Solution 2:

This baffled me too as I first came along, Jack M gave a good answer what really is behind it. And for the practicing mathematician, the „symbolic approach“ via variables, differentials and so on is just like a mental shorthand, that works by clever choosen notation. Maybe bear in mind that the chain rule could be read in two directions, one if you see at once the functions $\varphi$ and $f$, like maybe $\int_a^b x\cos(x^2+2) dx$, or in the other direction, where you need in some way "compute" your $\varphi$, like in $\int_a^b \cos(x^2+2) d x$ (here you cannot apply the chain rule directly by reading it according to the formula, you need to rearrange a little bit).

So what „mathematically“ are you doing here? You just compute your subsitution function $\varphi$! This could be done with all this „magical“ differential quotient stuff, suppose you have an integral of the form $$ \int_a^b f(\varphi(x)) dx $$ which as Jack M pointed out has nothing to do with the variable $x$, but is a function of functions (sometimes called functional), the variables are in this sense just „notational conventions" to have these mental shorthands for the chain rule. Okay, suppose $\varphi$ is invertible, and let $\psi := \varphi^{-1}$, then $$ \int_a^b f(\varphi(x)) d x = \int_{\psi(a)}^{\psi(b)} \psi'(x) f(\varphi(\psi(x)) d x = \int_{\psi(a)}^{\psi(b)} \psi'(x) f(x) d x. $$ this is just the chain rule, you can easily state this in the language Jack M does, and how I called the integration variables ($x$ or $t$ or whatever, doesn't matter!)

But how to compute $\psi$? Yes, how you would compute the inverse of $\varphi(x)$, write $y = \varphi(x)$ and try to solve for $y$, then rename, you will see that these are exactly the steps that are „hidden“ in the „symbolic-differential application“ of the chain rule. For my example $$ t = x^2 + 2 \mbox{ which has inverse } x = \sqrt{t - 2} $$ (this just works for example if $0 < a < b$ where the function indeed is invertible!). Now compute its derivate and plug in and you get $$ \int_{\psi(a)}^{\psi(b)} \psi'(t)\cdot \cos(t) dt = \int_{\sqrt{a-2}}^{\sqrt{b-2}} \left( \frac{1}{2\sqrt{t-2}} \right) \cdot \cos(t) d t = \int_{\sqrt{a-2}}^{\sqrt{b-2}} \frac{\cos(x)}{2\sqrt{x-2}} d x. $$ But what are you doing if you apply the symbolic method, you put $t = x^2 + 2$, then compute $dt / dx = 2x$ to get $dx = dt / 2x$, if you plug this in you still have $x$ in it, so solve for $x$ to get $x = \sqrt{t - 2}$, do you see how there is just the computation of the inverse function, and plugging its derivative in, is contained in these steps? Of course, the precise assumptions are hidden, to make this more rigorous first your substition function must the inverstible, and then what is at work here is the rule of inverse function differentiation $(\varphi^{-1})' = 1/(\varphi \circ \varphi^{-1})$ (do you see how this formula is hidden in the above steps?), which fits nicely with this "differential shorthand notation" too, making this "mental shorthand" working.