Why does substitution work in antiderivatives?

I'm not entirely sure what I want to ask here, so please bear with me!

I think the explanation they give us in school for how finding the antiderivative by substitution works is: $$\int f(g(t))g'(t)dt=(Fg)(t)=F(g(t))=F(u)=\int f(u)du$$

But I never really understood the equality $F(u)=\int f(u)du$. Why can we behave as if the identity $u=g(t)$ doesn't exist, compute the integral $\int f(u)du$, and then 'substitute' $u$ in the result yielding the correct answer? In a sense, it feels like the variable $u$ switches from being 'meaningful' in $F(u)$ (where it stands for $g(t)$) and being insignificant in $\int f(u)du$, since as I perceive it the "variable name" here is meaningless and we could say $ \int f(u)du = \int f(k)dk = \int f(x)dx = ... $ all the same. Another way to ask the same thing: why $(Fg)(t)=\int f(u)du$ and not, say, $(Fg)(t)=\int f(x)dx$? Where am I getting confused?

Maybe someone can explain this better than my high school teacher? Is there a 'formal' explanation for this? Thank you a lot!


Solution 1:

I will try to deal with indefinite integrals. Justifications for the definite integral are more complicated.

Here is the standard argument, somewhat extended. We want to find $$\int f(g(x)) g'(x)\;dx.$$

Note that if we can find an antiderivative $F(x)$ of $f(x)$, then one antiderivative of $f(g(x)) g'(x)$ is $F(g(x))$. We can check this by differentiating. For $(F(g(x)))'= g'(x)F'(g(x))$ by the Chain Rule, since $F'(x)=f(x)$. It follows that $$\int f(g(x)) g'(x)\;dx=F(g(x))+C.$$

There was a deliberately unfortunate choice of variable when I wrote "an antiderivative $F(x)$ of $f(x)$." The $x$ here is playing a different role than the $x$ in the original integration question. It would have been better to choose a letter different than $x$, say $u$ for the sake of tradition, or $w$, or $z$, and then to write that we want to find an antiderivative $F(u)$ of $f(u)$. Then our integral is $F(g(x))+C$.

As a shortcut to this, note that the collection of all antiderivatives of $f(u)$ is by definition $\int f(u)\;du$. This is $F(u)+C$. Now substitute $g(x)$ for $u$.

As a shortcut to the shortcut, imagine like before that $u$ is a symbol, but at the same time it is, mysteriously, $g(x)$. Then the substitution step of the previous paragraph is unnecessary, and we simply get $$\int f(g(x)) g'(x)\;dx =\int f(u)\,du.$$

How can $u$ be treated as a variable for the sake of the formal manipulation by which we find $\int f(u)\;du$, and simultaneously as the function $g(x)$? One might, as you do, legitimately wonder about this. One point to be made in favour of it is that the above argument proves that the procedure will always give the right answer.

In the integration by substitution process, instead of saying that we will find an antiderivative of $f(u)$ and then substitute $g(x)$ for $u$, we write instead at the beginning "Let $u=g(x)$," then find $\int f(u)\;du$. But it all amounts to the same thing.

It gets more complicated. Soon we will treating $du$ (whatever that means) as an abbreviation for $g'(x)\;dx$. But we can check that the symbolic manipulations that we do are still the Chain Rule in disguise, and we can certainly always check whether our symbolic manipulations give the right answer.

Comment: The following general idea is useful. Differentiation is ordinarily easy, integration not so much. If after some work one has calculated an indefinite integral, one can quickly check whether the answer is right, by differentiating. This can save us from errors both major and minor. As a simple example, suppose we want $\int e^{-3x}\,dx$. I will write down a wrong answer, $3e^{-3x}+C$. Let's check whether this is right. Differentiate, using the Chain Rule. We get $-9e^{-3x}$. Oops, wrong answer! But we can see how to fix our wrong answer, by dividing that answer by $-9$.

Solution 2:

What you are looking for is an intuitive interpretation of the formula for integration by substitution. I will try to provide it in rather simple words. Consider the example $\displaystyle F(x)=\int 2xe^{x^2}\, dx $.

You can easily infer by the inverse chain rule $\displaystyle F(x)=\int 2xe^{x^2}\, dx=e^{x^2}+c $. However, substituting $u=x^2$ (or more specifically $x=\sqrt{u}$) gives $\displaystyle F(u)=\int 2\sqrt{u}e^{u}\, du$ which leads not to the correct result. So substituting alone is not enough.

Actually, the difference between the function in terms of $x$ and the substituted (but invalid) one in terms of $u$ is as follows: enter image description here

It can be seen from the plot that the areas under the curves differ a lot and, hence, the antiderivatives will differ as well. If $x$ moves $\Delta x$ then $u$ will move $\Delta u$ where for most corresponding intervals $\Delta x<>\Delta u$. In the plot, the density reflects this fact: For example, some intervals of the function on the top map to smaller intervals on the substituted function indicating "a higher density". Let's think of the density as $ \Delta x/ \Delta u$.

A high density here means also that some area below the graph gets lost, because the graph becomes "squeezed". And it gets "stretched" (resulting in a too great area below the graph) for low densities. You can fix this behaviour by multiplying by the density and therefore virtually add area below the graph for high densities (density > 1) and virtually remove space at low densities (density < 1). In calculus, we need to define the infinitesimal density as $dx/du$ in order to represent it at a single point. But $dx/du$ is a differential quotient:

$$dx/du=x'(u)=\frac{1}{2\sqrt{u}}$$

And so,

$$ \displaystyle \int 2xe^{x^2}\, dx=\int 2\sqrt{u}e^{u} \left ( \frac{dx}{du}\right ) \, du $$

$$ \displaystyle \int 2\sqrt{u}e^{u} \left ( \frac{dx}{du}\right ) \, du=\int 2\sqrt{u}e^{u} \frac{1}{2\sqrt{u}}\, du=\int e^{u} \, du$$

$$ \displaystyle \int e^{u} \, du=e^{u}+c=e^{x^2}+c$$

I also struggled with this topic when I was younger and hope that the explanation helps a lot of people. You might also want to have a look into the full article on Insight Things.