The formalism behind integration by substitution

When you are doing an integration by substitution you do the following working. $$\begin{align*} u&=f(x)\\ \Rightarrow\frac{du}{dx}&=f^{\prime}(x)\\ \Rightarrow du&=f^{\prime}(x)dx&(1)\\ \Rightarrow dx&=\frac{du}{f^{\prime}(x)}\\ \end{align*}$$

My question is: what on earth is going on at line $(1)$?!?

This has been bugging me for, like, forever! You see, when I was taught this in my undergrad I was told something along the lines of the following:

You just treat $\frac{du}{dx}$ like a fraction. Similarly, when you are doing the chain rule $\frac{dy}{dx}=\frac{dy}{dv}\times\frac{dv}{dx}$ you "cancel" the $dv$ terms. They are just like fractions. However, never, ever say this to a pure mathematician.

Now, I am a pure mathematician. And quite frankly I don't care if people think of these as fractions or not. I know that they are not fractions (but rather is the limit of the difference fractions as the difference tends to zero). But I figure I should start caring now...So, more precisely,

$\frac{du}{dx}$ has a meaning, but so far as I know $du$ and $dx$ do not have a meaning. Therefore, why can we treat $\frac{du}{dx}$ as a fraction when we are doing integration by substitution? What is actually going on at line $(1)$?


Solution 1:

Consider evaluating $\int (3x^2 + 2x) e^{x^3 + x^2} \, dx$ (as in this Khan Academy video).

Often teachers will say, let $u = x^3 + x^2$, and note that "$du = (3x^2 + 2x) dx$". Therefore, they say, \begin{align} \int (3x^2 + 2x) e^{x^3 + x^2} \, dx &= \int e^u du \\ &= e^u + C \\ &= e^{x^3 + x^2} + C. \end{align}

However, this explanation is confusing because there's no such thing as $du$ or $dx$.

A more clear (in my opinion) and perfectly rigorous explanation is just to notice that our integral has the form $\int f(g(x)) g'(x) dx$, and use the rule \begin{equation} \int f(g(x)) g'(x) dx = F(g(x)) + C \end{equation} where $F$ is an antiderivative of $f$. This rule is clearly true, because it's nothing more than the chain rule in reverse. There's no need to use any "infinitesimals" or anything.

Solution 2:

Recall that $u$-substitution is really the inverse rule of the chain rule, just like integration by parts is the inverse rule of the product rule. The essence of the chain rule is that

$$ \frac{\mathrm{d}y}{\mathrm{d}x} = \frac{\mathrm{d}y}{\mathrm{d}u}\frac{\mathrm{d}u}{\mathrm{d}x},$$

which is why we like to write derivatives as ratios - often, when they look like they cancel, they really "do cancel," so to speak.

A better way of writing $u$-substitution is to say that $\dfrac{\mathrm{d}u}{\mathrm{d}x} = f'(x)$, though we might as well notate this as $u'(x)$, since that's what we're really doing. Then

$$ \int g(u(x))u'(x) \mathrm{d}x = \color{#F01C2C}{\int g(u(x)) \frac{\mathrm{d}u}{\mathrm{d}x}\mathrm{d}x = \int g(u) \mathrm{d}u} = \int g(u) \mathrm{d}u,$$

where I've notated the important equality in red. The step in red is visibly related to the chain rule: the part that looks like it cancels really does cancel. $\diamondsuit$

The theme here is that this is valid because of the chain rule, and the notation is chosen to support the cancellation effects. The fact that people go around separating this very convenient notation is largely for different reasons, and/or because they are implying a good amount of knowledge of "differentials."

We can even more directly relate this to the chain rule by giving a proof. Consider the function

$$ F(x) = \int_{0}^x g(t)\mathrm{d}t.$$

Consider the function $F(u(x))$ and differentiate it:

$$ \begin{align} F(u(x))' &= F'(u(x)) u'(x) = \frac{\mathrm{d}F}{\mathrm{d}u}\frac{\mathrm{d}u}{\mathrm{d}x}\\ &=\frac{\mathrm{d}}{\mathrm{d}u}\int_{0}^{u(x)} g(u(t))\mathrm{d}t \cdot u'(x)\\ &= g(u(x))u'(x). \end{align}$$

The the second fundamental theorem of calculus says that

$$\begin{align} \int_a^b g(u(x))u'(x)\mathrm{d}x &= F(u(b)) - F(u(a)) \\ &= \int_{a}^{b} g(u(t))u'(t)\mathrm{d}t \\ &=\int_{a}^{b}g(u(t))\frac{\mathrm{d}u}{\mathrm{d}t}\mathrm{d}t. \end{align}$$

Of course, we also know that $\displaystyle F(u(b)) - F(u(a)) = \int_{u(a)}^{u(b)} g(t) \mathrm{d}t = \int_{u(a)}^{u(b)} g(u) \mathrm{d}u$.