The differences between Lagrange and Leibniz's derivative notations

One problem I have found when learning calculus is that there are many different ways to denote the derivative. If $y=f(x)=x^2$, then we could write

\begin{align} f'(x)&=2x \\ y'&=2x \\ \frac{df}{dx}(x)&=2x \\ \frac{df(x)}{dx}&=2x \\ \frac{d}{dx}f(x)&=2x \\ \frac{dy}{dx}&=2x \end{align}

And this is just Lagrange and Leibniz's notations alone. What I find troubling is that they all seem to be suggesting subtly different things about what the derivative actually is. Is it a function, a limit of a quotient, or both? In the interests of keeping my post brief, I'll focus my attention on $f'(x)=2x$ and $\frac{dy}{dx}=2x$, as these seem to be the most common notations.

$$ f'(x)=2x $$

It does make sense to think of the derivative as the gradient function: $$ f'\colon x\mapsto\lim_{\Delta x \to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x} $$ In this case the limit expression is equal to $2x$, and so we can write $$ f' \colon x \mapsto 2x $$ However, this notation seems a little counter-intuitive when we consider what it means to differentiate a function with respect to a variable other than $x$. If I ask what is the derivative of $f(x)$ with respect to $\frac{x}{2}$, does this question make sense? Is it simply $f'(\frac{x}{2})$? Or do we have to express $x^2$ in terms of $\frac{x}{2}$? And how can we can express this derivative using Lagrange's notation?

$$ \frac{dy}{dx}=2x $$

There are many things which are nice about Leibniz's notation, including the fact that it is explicit which variable you are differentiating with respect to. However, in this case, it is unclear whether we are talking about a function, or something else entirely. There are other issues. Some people say they dislike the Leibniz formulation of the chain rule $$ \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx} $$ saying that they find it to be inaccurate. I don't really understand why this is the case. Could someone please elaborate?

Solution 1:

Derivatives at a point are numbers (and these numbers are calculated as limits of a certain quotient), and if for each point you assign a number which is the derivative at that point, then you of course get a function $\Bbb{R}\to \Bbb{R}$. Leibniz's notation is confusing because it doesn't tell you where the derivatives are being evaluated, hence blurs the distinction between functions vs function values. (it may not seem like such a big deal especially when doing simple problems, but I guarantee that it will quickly get very confusing in multivariable calculus if all these basic concepts aren't kept straight).

Writing the chain rule as $\dfrac{dy}{dx} = \dfrac{dy}{du} \dfrac{du}{dx}$ is inaccurate for several reasons:

It introduces completely irrelevant letters in the denominator (an unfixable flaw with Leibniz's notation)
Doesn't tell you where the derivatives (which are functions as I explained in my previous paragraph) are being evaluated (you can try to make this more precise, but then you lose the "simplicity" of Leibniz's notation).
The $y$ on the LHS has a completely different meaning from the $y$ on the RHS (this wouldn't be a huge deal if there was no chance of confusion... but unfortunately it causes a lot of confusion especially in several variables; see link below)

The third is I think the biggest problem, and I'll try to explain that now. In Lagrange's notation, the chain rule is expressed as $(y\circ u)'(x) = y'(u(x)) \cdot u'(x)$, or if you want to write a proper equality of functions, it is just $(y\circ u)' = (y'\circ u)\cdot u'$. So, there are actually three functions involved: there is $y$, there is $u$ and there is the composition $y\circ u$. The chain rule tells us how the derivatives of these three functions are related.

However, when you write $\dfrac{dy}{dx} = \dfrac{dy}{du}\cdot \dfrac{du}{dx}$, it gives the incorrect impression that there are only two functions, $y$ and $u$. Well, now you could argue that on the LHS we should "consider $y$ as a function of $x$" while on the RHS "$y$ is a function of $u$" so these are different things. This is of course right, the two things are very different, but this is all covered up in the notation. A perhaps slightly better way of writing it would be $\dfrac{d(y\circ u)}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx}$. But this is also not quite correct. Basically, any attempt to write the chain rule down formally is a huge nightmare. The best I can do is say that for every $x\in \text{domain}(u)$, \begin{align} \dfrac{d(y\circ u)}{dx}\bigg|_x &= \dfrac{dy}{du}\bigg|_{u(x)}\cdot \dfrac{du}{dx}\bigg|_x \end{align} This fixes issues $(2)$ and $(3)$ mentioned above to an extent, but $(1)$ still remains an issue.

You said in the comments that

I don't see much of a problem with $y$ depending on both $u$ and $x$, given that $u$ and $x$ are also related.

Well, if originally $y$ "depends on $u$", how can it all of a sudden "depend on $x$"? Of course, I know what you mean, but the proper way to indicate this dependence is not to say that "$y$ depends on $x$", but rather that the composite function $y\circ u$ depends on $x$. Here, you might think that this is just me being pedantic with language; and you're right. However, the reason I'm pedantic is because that poor language and notation leads to conceptual misconceptions; this has been both my experience when studying and also based on what I've observed from some questions on this site. For example, in this question, the OP finds that $\frac{\partial F}{\partial y} = 0$ and $\frac{\partial F}{\partial y} = -1$. The reason for this apparent contradiction is that the two $F$'s are actually completely different things (I also recall a question in the single variable context, but I can't seem to find it).

Regarding your other question

If I ask what is the derivative of $f(x)$ with respect to $\frac{x}{2}$, does this question make sense? Is it simply $f'(\frac{x}{2})$? Or do we have to express $x^2$ in terms of $\frac{x}{2}$? And how can we can express this derivative using Lagrange's notation?

The answers in succession are "one could make sense of this question", "no", and "yes". Let me elaborate. So, here, we're assuming that $f:\Bbb{R}\to \Bbb{R}$ is given as $f(x) = x^2$. To make precise the notion of "differentiating with respect to $\frac{x}{2}$", one has to introduce a new function, $\phi:\Bbb{R}\to \Bbb{R}$, $\phi(t) = 2t$. Then, what you're really asking is what is the derivative of $f\circ \phi$? To see why this is the proper way of formalizing your question, note that \begin{align} f(x) &= x^2 = \left(2 \cdot \dfrac{x}{2}\right)^2 = 4 \left(\frac{x}{2}\right)^2 \end{align} and that $(f\circ \phi)(t) = f(2t) = (2t)^2 = 4t^2$. So this is indeed what we want.

And in this case, \begin{align} (f\circ \phi)'(t) &= f'(\phi(t)) \cdot \phi'(t) \\ &= [2 \cdot \phi(t)] \cdot [2] \\ &= [2\cdot 2t] \cdot 2 \\ &= 8t \end{align}

Notice how this is completely different from $f'\left(\frac{x}{2}\right) = 2 \cdot \frac{x}{2} = x$.

In general, when you have "___ as a function of $\ddot{\smile}$ " and you instead want to "think of ___ as a function of @", what is going on is that you have to use an extra composition. So, you need to have three sets $X,Y,Z$, a given function $f:Y\to Z$ (i.e we think of elements $z\in Z$ as "functions of" $y\in Y$) and if you now want to think of "z as a function of $x$", then what it means is that you somehow need to get a mapping $X\to Z$ which involves $f$ somehow. In other words, we need a certain mapping $\phi:X \to Y$ and then consider the composition $f\circ \phi$ (see for example the remarks towards the end of this answer).

Things can be slightly confusing when all the sets are the same $X=Y=Z = \Bbb{R}$, but in this case you should think of the three $\Bbb{R}$'s as "different copies" of the real line, and that each function maps you from one copy of the real line to another copy of the real line.

Edit:

Here's a passage from Spivak's Calculus text (Chapter 10, Question 33), where I first learnt about the double usage of the same letter.

Spivak