If we have some differentiable function $f(x)$, then by the definition, we have:

$$\mathrm{d}y = f'(x) \mathrm{d}x$$

This makes sense; the differential $\mathrm{d}y$ can be thought of as a function that expresses the linear change in $y$ given a finite change $\mathrm{d}x$ in $x$. Now, what confuses me is why we can simply do this:

$$\int \mathrm{d}y = \int f'(x) \mathrm{d}x$$

and just magically append integrals to both sides of the equation. Some people tell me that the "differentials" in the integrand aren't really being multiplied by the integrand, and others tell me that they are. If they are not multiplied by the integrand, then I don't see why that above step is justified. $\mathrm{d}y$ has changed from a finite quantity to a "closing parentheses" on the integral once we performed the integration. But somehow, this method works, and it is essentially how most students are taught (including myself) to solve very simple differential equations.

To complicate matters further, my physics teacher explained that $\frac{dy}{dx}$ is a quotient of infinitesimal changes, and therefore $\mathrm{d}y = f'(x) \mathrm{d} x$ is quite easy to see. The integration symbols are just summing these infinitesimals over a given region (whether it be an area, a volume, or in the above case, a simple one dimensional segment). From this viewpoint, we are actually multiplying the integrand by $\mathrm{d}x$.

Although this "definition" of differentials disagrees with their definition as finite quantities (as described at the top of the question) , it does seem to satisfy my intuition quite nicely. For instance, it solves the mystery of "magically appending" integrals to both sides of an equation. For example, consider the integral:

$$\int x \cos\left(x^2\right) \mathrm{d}x = \frac{1}{2} \int \cos(u) \mathrm{d}u$$

where I have simply replaced $x \times \mathrm{d}x$ with $\frac{1}{2} \mathrm{d}u$. If the differentials weren't part of the product, and were just "closing parentheses" (as some people have told me), then why does this work? Viewing the differential as an infinitesimal part of a product therefore has nice properties; not only is it dimensionally accurate, but it also makes these $u$-substitutions quite intuitive. If $\mathrm{d}x$ were just a closing parentheses on the integral, I don't see how we can substitute $x$ times a closing parentheses with another closing parentheses.

But recently, in my multivariable class, we learned that:

$$\mathrm{d}x \mathrm{d}y = \left| \frac{\partial(x, y)}{\partial(u, v)}\right| \mathrm{d}u \mathrm{d}v$$.

and we learned that the differential element $\mathrm{d}A = \mathrm{d}x \mathrm{d}y$ is the area of an infinitesimally small rectangle. Under the transformation of the jacobian, an infinitesimally small rectangle area in $uv$-space differs from that of a rectangle in $xy$-space by a factor of the jacobian's determinant. This matrix is essentially transforming $uv$-space areas into $xy$-space areas. This makes sense, but when coupled with the definition of a differential, I see an apparent contradiction. Consider the polar coordinate transformation, in which we have $x = r \cos{\theta}$ and $y = r\sin{\theta}$. By the above equation, we have that $\mathrm{d}x \mathrm{d}y = r \mathrm{d}r \mathrm{d}\theta$. But if we were to compute the differentials of $x$ and $y$ and multiply them together, expanding the product would not give us $r \mathrm{d}r \mathrm{d}\theta$. It would give us a bunch of ugly stuff, and I'm not even sure this "stuff" is meaningful.

If the differentials were truly being multiplied by one other, then after expanding their product (which I am not going to do here), the result would be $r \mathrm{d}r \mathrm{d}\theta$. But this is not the case. So in the end, neither the viewpoint of differentials as "infinitesimals" multiplied to the integrand, nor the viewpoint of differentials as linear changes makes sense to me.

If differentials were infinitesimals multiplied to an integrand, then why does expanding the product $\mathrm{d} x(r, \theta) \times \mathrm{d}y(r, \theta)$ not yield $r \mathrm{d}r \mathrm{d}\theta$?

If differentials were finite, linear changes, why can we just magically "slap on" integrals on both sides of the equation $\mathrm{d}y = f'(x) \mathrm{d}x$?

And probably most importantly, are differentials just closing parantheses, or are they actually multiplied by the integrand? If we consider them as part of an infinitesimal product, then this makes appending integrals to both sides justified. But if they are just "closing parantheses", then why is appending integrals to both sides of an equation justified? A finite quantity ($\mathrm{d}x$) cannot magically "transform" into a piece of notation by simply writing an integral symbol on both sides of the equation.

I am very confused about differentials, and I would appreciate if someone could clarify exacty why multiplying them doesn't work in the intuitive way described above, and why we are allowed to just append integrals to both sides of an equation.


Solution 1:

$\newcommand{\d}{\mathrm{d}}$ I think what confuses you is the two different approaches to the calculus.

In most schools they teach the standard calculus where $\frac{\d}{\d x}(\cdot)$ is the operator to get the derivative and $\int(\cdot)\d x$ is the operator to get the antiderivative. They usually defined in terms of limits.

So these are just operators and a notational convention. And this notational convention and the apparent cancellation of the $\d x$ terms makes it easy to do the chain rule and the u-substitution.

In this approach the $\d x$ alone doesn't have a meaning. You must have the $\int$ symbols on the left side as they are the part of the operator.

This way the standard calculus avoids the philosophical problems of infinitesimals by not working with them at all.


The other approach is the non-standard calculus which defines non-zero infinitesimals (NZIs) in a rigorous way. The following is based on what I understand it about so far. It may not be entirely precise, but should give you the idea.

Polynomials in the NZIs are called hyperreal numbers.

An NZIs have these properties (use $\epsilon$ for NZI):

  • $0 < | \epsilon | < x$. For any positive $x$.
  • $k\epsilon$ is an NZI for any non-zero $k$.
  • $0 < | \epsilon^n | < x | \epsilon^{n-1}|$. For any positive $x$. So powers of an NZI are infinitely smaller than any any lower power of it. I will call the exponent of the lowest ordered term in a hyperreal number as order.

  • $k < k + |\epsilon| < k + x$. Adding an NZI to a number causes smaller change than adding any positive number. This means the change is negligible. It can be shown that no addition multiplication or divison can make it affect the real part, so high order infinitesimal terms can be ignored, as long as it doesn't change the order of the result. For example you can ignore $\epsilon$ in $x + \epsilon$. But not in $f(x + \epsilon) - f(x)$ for any differentiable function $f$ (as it would cause the result to be zero). The standard part function is defined something like this.

Although There are several ways infinitesimals can create real numbers:

  • Divide two infinitesimals: $\frac{k\epsilon}{l\epsilon} = \frac{k}{l}$.
  • Infinite sums: $\sum_{i=0}^{\frac{a}{\epsilon}} b \epsilon = ab$.
  • Infinite powers: $(1 + \epsilon)^{1/\epsilon} = e$.

In non-standard calculus the differentials are first order infinitesimals. And a differential of a function can be defined like this:

$$\d (f(a,b,c,...) = f(a+\d a, b+\d b, c+\d c, ...) - f(a,b,c,...)$$

Example:

$$\d (x^2) = \\ (x + \d x)^2 - x^2 = \\ x^2 +2x \d x + (\d x)^2 - x^2 = \\ 2x \d x + (\d x)^2 $$

Then the high order terms can be ignored to get $2x\d x$. Thus $\frac{\d(x^2)}{\d x} = 2x$.

Similarly definite integrals can be written up as infinite sums:

$$\int_a^b f(x) \d x = \sum_{i=0}^{\frac{b-a}{\epsilon}} f(a + i \epsilon) \epsilon$$

Note that $\d x$ in the integral notation also tell you the running term. While in sum you need to write that explicitly.

Example:

$$ \int_a^b x^2 dx = \sum_{i=0}^{\frac{b-a}{\epsilon}} (a + i \epsilon)^2 \epsilon = \\ \sum_{i=0}^{\frac{b-a}{\epsilon}} (a^2 + 2ai\epsilon + i^2 \epsilon^2) \epsilon = \\ a^2(b-a) + 2a\epsilon^2\sum_{i=0}^{\frac{b-a}{\epsilon}}i + \epsilon^3 \sum_{i=0}^{\frac{b-a}{\epsilon}}i^2 = \\ a^2(b-a) + 2a\epsilon^2\left(\frac{(b-a)^2}{2 \epsilon^2} + \frac{b-a}{2\epsilon}\right) + \epsilon^3\left(\frac{(b-a)^3}{3 \epsilon^3} + \frac{(b-a)^2}{2 \epsilon^2} + \frac{(b-a)}{6 \epsilon^3}\right) = \\ a^2(b-a) + a(b-a)^2 + \frac{(b-a)^3}{3} = \\ a^2b - a^3 + a(b^2 - 2ab + a^2) + \frac{b^3 - 3b^2 a + 3ba^2 - a^3}{3} = \\ \frac{3a^2b - 3a^3 + 3ab^2 - 6a^2b + 3a^3 + b^3 - 3b^2 a + 3ba^2 - a^3}{3} = \\ \frac{b^3 - a^3}{3} = \\ \frac{b^3}{3} - \frac{a^3}{3} $$