How is it that treating Leibniz notation as a fraction is fundamentally incorrect but at the same time useful?

I have long struggled with the idea of Leibniz notation and the way it is used, especially in integration.

These threads discuss why treating Leibniz notation as a fraction and cancelling differentials is incorrect, but also go on to say that the notation is suggestive and we use it because it simplifies things:

What is the practical difference between a differential and a derivative?

If dy/dt * dt doesn't cancel, then what do you call it?

In them they say to treat the differential at the end of an integration expression as a "right parenthesis". This throws me off a bit because we can so easily do something like: $$\int\cos(3x) \, dx\\ u=3x\\[2ex] du=3\,dx\\[2ex] \frac{1}{3}du=dx$$ and then proceed to integrate: $$\frac{1}{3}\int\cos(u) \, du$$ and arrive at the correct answer with "incorrect" notation. I am supposed to treat the differential as a parenthesis but using this notation the differential seems to have a value.

How does this incorrect notation do such a good job ensuring that we do not disobey the "reverse chain rule" and ensures that our integrand is in the form $f'(g(x))\,g'(x)$ ?

People often say that it is very suggestive and I am wondering how. Excuse the LaTeX if it looks weird. This is my first time using it.


Solution 1:

It is not "fundamentally incorrect." It is informal and intuitive. It is true that Leibniz notation does not rigorously prove anything, but often it is more important (in my opinion) to have an intuitive understanding of why things make sense than of how to prove them formally.

A derivative is a measure of how much one variable changes in response to a change in another variable. So the symbol "$dy$" is the change in y which - taking y as a dependent variable - has been caused by a change in, say, $x$. That change we will call $\Delta x$ or $dx$. The ratio of the two changes $$\frac{dy}{dx}$$ represents how much y is altered in proportion to alterations in x (for small changes).

In this light, the chain rule should be obvious. Say z depends on y, and y in turn depends on x. $$z=z(y),\hspace{2mm} y=y(x)$$ If we change x, what happends to z? Well, when y changes by $\Delta y$, z will change by $$\frac{dz}{dy}\Delta y$$ So how much has y changed? It has changed by $$\Delta y=\frac{dy}{dx}\Delta x$$ In total then, the change in z is $$\Delta z=\frac{dz}{dy}\frac{dy}{dx}\Delta x$$ The ratio of these changes is just $$\frac{\Delta z}{\Delta x}=\frac{dz}{dy}\frac{dy}{dx}$$ In other words, we find the rate of change of z by using its derivative with respect to y, and then incorporating how much y has actually been effected by the independent variable x.

We could even make things more fancy. What would $$\frac{d\big(\sin(x)\big)}{d\big(\sin(x)\big)}$$ equal (assuming that you accept it as coherent)? Clearly if $\sin(x)$ changes by some amount, the dependent function $\sin(x)$ will change by an equal amount. So the derivative is $1$. It is not a mistake that $$\frac{dx}{dx}=1$$ The changes are equivalent and so "cancel." You can also interpret this as saying the "line $y=x$ has a slope of $1$ at all points," but the Leibniz notation is designed for another interpretation.

Inverse functions pose a similar issue. You probably know that $$\frac{d}{dx}\left(f^{-1}(x)\right)=\frac{1}{f'\left(f^{-1}(x)\right)}$$ With differentials, it is easier to see why this makes sense. Clearly $$\frac{dx}{dy}=\frac{1}{\frac{dy}{dx}}$$ (Remember, I don't say "clearly" because that is simple algebra - I say it because if you understand what the derivative really means, you should see why an application of said algebra is logically valid.) The change in x and the change in y, as long as the function is actually invertible, remain the same. The inverse function will have a derivative that is the reciprocal of the original function.

-----Are there "counterexamples" to this version of differentials?-----

I would argue that there are not (and bring them on if you think you've found one!). The one which is most commonly exhibited is the following interesting formula. Let $$z=f(x,y)=0$$ Then $$\frac{dy}{dx}=-\frac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}}$$ If we were to "cancel" the differentials, we would mistakenly deduce the false claim $$\frac{dy}{dx}=-\frac{dy}{dx}$$ where I have changed partials back to normal derivatives.

This example, however, is based on a simple misunderstanding of what the symbols represent. The "$\partial f$" which occurs in $$\frac{\partial f}{\partial x} $$ is the change in $f$ resulting from a change in the first variable $x$. The "$\partial f$" which occurs in $$\frac{\partial f}{\partial y} $$ is the change in $f$ resulting from a completely different change in the second variable $y$. In other words, the two "$\partial f$"s are different quantities. They cannot be canceled. I would add that anti-intuitionists make Leibniz roll over in his grave every time they think he would have fallen for such an elementary error on the basis of what I consider to be the most brilliant notation ever devised.

Moreover, a simple 'proof' of the equation follows from basic reasoning about differentials. We have that $$\Delta f=\frac{\partial f}{\partial x}\Delta x+\frac{\partial f}{\partial y}\Delta y$$ Using the fact that $$\Delta y=\frac{dy}{dx}\Delta x$$ we get $$\Delta f=\Delta x\left(\frac{\partial f}{\partial x}+\frac{dy}{dx}\frac{\partial f}{\partial y}\right)$$ Since $f(x,y)=0$ over the curve, it will not change. Hence $\Delta f=0$ and the equation follows immediately.

A derivative divides two quantities which are both very small. An integral does just the opposite: it multiplies two quantities, one of which is large (the sum), and one of which is small (the "$\Delta x$").

The integral $$\int$$ looks like an elongated "S" on purpose - it is meant to be a "sum." It is a sum of quantities multiplied by the intervals over which they are evaluated: $$\sum{f(x_i)\Delta x_i}$$ Note that the $\Delta x$ still means the same thing; it is how much we have allowed $x$ to change. So what would happen if we integrated (summed) differentials (differences)? In other words, what is $$\int{d\big(f(x)\big)}$$ Well, if we add up all the changes over the sub-intervals, we get the net (or "total") change. Algebraically, $$\sum{\Delta f(x_i)}=\big(f(x_1)-f(x_0)\big)+\big(f(x_2)-f(x_1)\big)+\cdots +\big(f(x_n)-f(x_{n-1})\big)=f(x_n)-f(x_0)$$ because the sum telescopes. In other words, we have that $$\int{d\big(f(x)\big)}=f(x)$$ and if bounds are specified, we take the change in $f$ at the bounds (that is, $f(b)-f(a)$). The more familiar form of this theorem is $$\int{f'(x)\hspace{1mm}dx}=f(b)-f(a)$$ Once again, $$d\big(f(x)\big)=f'(x)dx$$ It all clicks!

Now let me re-write your example to see if it makes more sense. We can say that $$\int{\cos(3x)dx}=\frac{1}{3}\int{\cos(3x)3dx}=\frac{1}{3}\int{\cos(3x)d(3x)}=\frac{1}{3}\sin(3x)$$ Or another "fancy" one: We could say that $$\int{2\sin(x)d\big(\sin(x)\big)}=\sin^2(x)$$ Why? Because we are adding up changes in $\sin^2(x)$ - observe that $d(\sin^2(x))=2\sin(x)d(\sin(x))$.

If you have had multi-variable calculus, we can draw the examples further. Suppose we are asked to compute a line integral of the form $$\int{xdy+ydx}$$ The "Leibniz" way to solve this problem is as follows (think about the product rule). $$\int{xdy+ydx}=\int{d(xy)}=xy$$ The "normal" way to solve it would go something like this: We check that $$N_x-M_y=0-0=0$$ so the field is conservative. We integrate $f_x=y$ to get $$f(x,y)=xy+\phi(y)$$ Then $$f_y=x+\phi'(y)=x$$ so (after some work) for $f(x,y)=xy+c,\hspace{3mm} c\in\mathbb{R}$, we have $\nabla f=(y,x)=(M,N)$ and therefore the integral is $f=xy+c$.

Or take the famous Integration By Parts formula $$\int{udv}=uv-\int{vdu}$$ Most people don't seem to have any intuition for why this is true. However, if we manipulate it slightly we get $$\int{udv}+\int{vdu}=uv$$ Does that not look suspiciously like what we had just above? Can there really be a connection between integration by parts and vector line integrals?? YES!! They are all based on the same intuitive ideas.

Don't get me wrong: it is immensely important to understand the real definition of limits, and to witness how they can be used to systematically build up calculus into a formal and rigorous structure. But this should not be done at the cost of losing an informal grasp of what calculus is really about, and how it was discovered in the first place (hint: Newton did not have an epiphany involving epsilons and deltas when he saw an apple fall). I do not have space to talk about more objects in calculus, but I believe that, with enough thought, they all turn out to be deeply related and surprisingly intuitive.

Solution 2:

I agree with @user138053 that It is not "fundamentally incorrect" and that it is intuitively appealing. Furthermore, Leibniz notation rigorously proves everything that the noninfinitesimal approach proves, once it is formalized in the context of a modern theory of infinitesimals such as the hyperreal numbers. Unless one thinks that infinitesimals are "fundamentally incorrect", Leibniz's notation is not incorrect, either.

Solution 3:

Leibniz notation is useful and suggestive because it is fundamentally correct. Anyone who tells you otherwise simply doesn't know what they're talking about.

It's understandable how this misconception came to be. When Calculus was first developed in the 17th century, it was clear that it didn't have the same level of logical rigour that arithmetic and geometry had had for about 2000 years prior. It wasn't until the 19th century that this changed, and the rigorous development at that time did not include differentials or any other infinitesimal quantity. So at one time, it was fair to say that we did not have a fundamentally correct account of Leibniz notation. But it was never correct to say that the notation itself was fundamentally incorrect, only that if it was correct, then we did not know how.

Since another hundred years or so has passed since, it should be no surprise that the situation has changed. If you can derive a contradiction from a notation, then that's a sign that it is fundamentally incorrect, but it's also possible that one can figure out rules on its usage that will prevent contradictions. And if a notation is useful and suggestive, then that's a sign that it is fundamentally correct, at least when appropriately used. So there was never a time that anybody could have reasonably believed that they knew that Leibniz notation was fundamentally incorrect, and indeed, we now know that it is fundamentally correct.