Challenge: Demonstrate a Contradiction in Leibniz' differential notation
Solution 1:
The discussion here has been quite interesting! I wrote about Leibniz's notation in my Bachelor's Thesis in 2010 reading through major parts of Bos's 1974 PhD on higher order differentials in the Leibnizian calculus. I believe Bos is wrong at one point. Assuming one variable in the by Bos so-called arithmetic progression is never necessary - only convenient! I will answer to that below.
Leibniz's differentials
Leibniz developed his differentials, at first, from a geometrical intuition - although he reconsidered the actuality of this idea time and again. In my words, this idea can be very briefly summarized as:
A curve can be thought of as a polygon with infinitely many infinitely small sides $ds$. Each $ds$ is an infinitesimally small straight line segment being a part of the curve and (paradoxically) tangent to it at the same time. Gathering the $ds$ to one straight line segment $s=\int ds$ this will constitute the length of the curve. Expressing such a curve by a geometrical relation between coordinate line segments $x$ and $y$ one could consider each $ds$ as hypotenuse of a right triangle with legs $dx$ and $dy$ so that $dx^2+dy^2=ds^2$.
This is only to say that $dx,dy$ and $ds$ was thought of as geometrical and mutually dependend entities - never considered just numbers like we allow functions to be today.
Just to stress how geometrical: the function nowadays expressed by the formula $f(x)=x^2$ would be something like $a\cdot y=x\cdot x$ where $a,y$ and $x$ where all considered line segments so that both sides of the equation would constitute an area in Leibniz's time.
The level curve example
In the fractions $\frac{\partial f}{dx}$ and $\frac{\partial f}{dy}$ the $\partial f$'s in the two fractions are unrelated because:
- We do not have $\partial f,\partial x$ and $\partial y$ mutually dependend geometrical entities due to the reason you already gave that the first $\partial f$ is the change in $f$ when you move in the $x$-direction by the vector $(dx,0)$ whereas the second $\partial f$ corresponds to moving by the vector $(0,dy)$. So they are unequal although infinitesimally small ...
- Even if we had some $df$ mutually dependend to $dx$ and $dy$ this would naturally have to be the change in $f$ when you travel the vector $(dx,dy)$ and thus different from the $\partial f$'s described before.
The chain rule example
Since we consider higher order differentials the work of Bos is relavant here: Had there been such thing as a derivative $z=\frac{dy}{dv}$ in Leibniz's time, the differential of that should read $$ dz=d\frac{dy}{dv}=\frac{dy+ddy}{dv+ddv}-\frac{dy}{dv}=\frac{dv\ ddy-dy\ ddv}{dv(dv+ddv)} $$ Now, since $ddv$ is infinitesimally small compared to $dv$ we may skip $ddv$ in the bracket and simply write $dv$ instead of $(dv+ddv)$. Therefore we have $$ \frac{dz}{dv}=\frac{dv\ ddy-dy\ ddv}{dv^3}=\frac{ddy}{dv^2}-\frac{dy\ ddv}{dv^3} $$ Note that $ddy$ can also be written as $d^2 y$. So the second order derivative of $y$ with respect to $v$ equals $\frac{d^2 y}{dv^2}$ minus some weird fraction $\frac{dy\ d^2 v}{dv^3}$ which can only be disregarded if it is zero. This only happens if either $dy=0$ or $d^2 v=0$. Choosing $d^2 v$ identical zero does the trick and renders $dv$ constant.
Suppose now that $d^2 v\equiv 0$. Then for the example $y=u=v^2$ we see that $du=2v\ dv$ and furthermore $ddu=2v\ ddv+2\ dv^2=2\ dv^2$ where the last equality is due to our choice that $ddv$ is identical zero. Therefore we see that the derivative of $w=\frac{dy}{du}$ will be given as $$ \frac{dw}{du}=\frac{d^2 y}{du^2}-\frac{dy\ ddu}{du^3} $$ where the last fraction is far from being zero as it may be rewritten - noting that $y=u\implies dy=du$ and that $\frac{dv}{du}=\frac{1}{2v}$ - to obtain $$ \require{cancel} \frac{\cancel{dy}\ ddu}{\cancel{du}\cdot du^2}=\frac{2\ dv^2}{du^2}=\frac{1}{2v^2} $$ This shows that assuming $\frac{d^2 y}{dv^2}$ to be the second order derivative of $y=v^2$ with respect to $v$ in the modern sense makes $\frac{d^2 y}{du^2}$ differ by $\frac{1}{2v^2}$ from being the second order derivative of $y=u$ with respect to $u$. Now since we know that $y=u$ we have $w=\frac{dy}{du}=1$ and thus $\frac{dw}{du}=0$. Therefore we must have $$ \frac{d^2 y}{du^2}-\frac{1}{2v^2}=0 $$ in this case showing that $\frac{d^2 y}{du^2}=\frac{1}{2v^2}$. So with the choice $y=u=v^2$ and $ddv\equiv 0$ the equation $$ \frac{d^2 y}{du^2}\cdot\left(\frac{du}{dv}\right)^2=\frac{d^2 y}{dv^2} $$ may be successfully checked applying that $\frac{du}{dv}=2v$ since we then have $$ \frac{1}{2v^2}\cdot(2v)^2=2 $$ which is actually true. This is NOT a coincidence!
Conclusion
The above calculations show that Julian Rosen's very appealing example of failure in the method of the Leibnizian calculus seems to be a misunderstanding about what is meant by the notions of $d^2 y$ and the hidden, but important, additional variables $ddv$ and $ddu$. This provides specific details regarding the comments given by user72694 below the answer from Julian.
However, proving that Leibniz's notation will never produce false conclusions when handled correctly is a whole different story. This is supposedly what Robinson managed to do, but I must admit that I have not read and understood that theory myself.
My Bachelor's thesis focused mainly on understanding how the method was applied by Leibniz and his contemporaries. I have often times thought about the foundations, but mainly from a 17th century perspective.
Comment on Bos's work
On page 31 in his thesis, Bos argues that the limit $$ \lim_{h_1,h_2\rightarrow 0}\frac{[f(x+h_1+h_2)-f(x+h_1)]-[f(x+h_1)-f(x)]}{h_1 h_2} $$ only exists if $h_1=h_2$ which then makes this limit equal $f''(x)$. But that is in fact not entirely true. The $x$-differences $h_1$ and $h_2$ need not be equal. It suffices for them to converge to being equal which is a subtle, but important, variation of the setup. We must demand that $h_1$ and $h_2$ converge to zero in a mutually dependend fashion so that $$ \lim_{h_1,h_2\rightarrow 0}\frac{h_2}{h_1}=1 $$ With this setup the limit of the large fraction from before may still exist, but need not equal $f''(x)$. Since $h_1,h_2$ play the role of $dx$'s this is equivalent to allowing $dx_1\neq dx_2$ so that $ddx=dx_2-dx_1\neq 0$ although being infinitely smaller than the $dx$'s.
This means that it is in fact possible to imitate the historical notion of $dx$ being constant (and thereby $x$ in arithmetic progression) directly by modern limits.
Extras regarding the OP's answer
You are quite right that the differentials can be succesfully manipulated into the equation $$ \frac{d^2}{dv^2}\big(y(u(v))\big)=y''(u(v))\cdot u'(v)^2+y'(u(v))\cdot u''(v) $$ under the assumption that $ddv\equiv 0$.
There is, however, a more obvious and even less restrictive choice to leave the progressions of all three variables $u,v$ and $y$ unspecified, and yet to connect the notation in a meaningful way to modern standards:
Introduce a fourth variable $t$ in arithmetic progression (i.e. $ddt\equiv 0$). One could think of it as a time variable so that $u(t),v(t)$ and $y(t)$ are coordinate functions of some vector valued function. Then Julian Rosen's equation can be directly transformed to $$ \frac{\left(\frac{d^2 y}{dt^2}\right)}{\left(\frac{du^2}{dt^2}\right)}\cdot\left(\frac{\left(\frac{du}{dt}\right)}{\left(\frac{dv}{dt}\right)}\right)^2=\frac{\left(\frac{d^2 y}{dt^2}\right)}{\left(\frac{dv^2}{dt^2}\right)} $$ and since $dt$ is in arithmetic progression $y''(t)=\frac{d^2 y}{dt^2}$ so that this may be written in modern notation as $$ \frac{y''(t)}{u'(t)^2}\cdot\left(\frac{u'(t)}{v'(t)}\right)^2=\frac{y''(t)}{v'(t)^2} $$ which is easily verified to be correct. This is probably the simplest account, but it only uses but does not give a very clear example of the necessity of choosing the progression of the variables. I think my first account did that better.
Solution 2:
The gist of the OP's explanation of why the "cancellation" of $\partial f$'s should not be allowed (and does not work) is correct, but something more can be said.
The partial derivative $\partial f/\partial x$ is the rate at which $f$ changes with respect to change in $x$, but while holding y constant. Similarly the definition of $\partial f/\partial y$ entails a rate of change while holding $x$ constant.
Manipulation of the $dx$ and $dy$ symbols separately (rather than as an ordinary derivative $dy/dx$) produces sensible results:
$$ \frac{\partial f}{\partial y} \;dy + \frac{\partial f}{\partial x} \;dx = 0 $$
which accords with the underlying premise that $x,y$ are restricted to a level curve:
$$ f(x,y) = \text{constant} $$
This sensible computation, despite appearing as a superficial manipulation of symbols, is taught in freshman calculus as implicit differentiation, so it bears consideration why this should be allowed, while "cancelling" $\partial f$'s should not. There is the hidden premise that $x$ is being kept constant when taking as a limit $\partial f/\partial y$, and similarly holding $y$ constants when taking $\partial f/\partial x$. Combining a change in $x$ with one in $y$ is then properly done by implicit differentiation, subjecting their mutual changes to a constraint that $f$ is being kept "level".
Added: A good notation is useful at least as much for what it hides/suppresses from its definition as for what it suggestively expresses. If a notation is soundly defined, any contradiction that arises from proper use has to be blamed on the underlying theory, rather than the notation itself.
Of course in hiding some parts of the definition, a notation lends itself to "abuse". As we see above thinking of derivatives as "fractions" literally is suggested by the notation, and sometimes "allowable", sometimes not.
A related pitfall having to do with first partial derivatives is their commutativity. We all "know" that under mild smoothness assumptions:
$$ \partial (\partial f /\partial x)/\partial y = \partial (\partial f /\partial y)/\partial x $$
However this depends on the pair $x,y$ being independent variables (holding one fixed while varying the other). I once tried to commute first partials while mixing Cartesian and polar coordinates in teaching a class, and promptly got a contradiction!
Consider for example the polynomial $f = x^2 + y^2 = r^2$ in both Cartesian and polar coordinates. Now $\partial (\partial f/\partial \theta)\partial x$ is identically zero, but $\partial (\partial f/\partial x)\partial \theta$ is not!
Fortunately I was able to learn from my mistake (not sure how much the students benefitted other than from entertainment value), and later it helped me appreciate why shape function derivatives do not commute in general.
So even when notations suggestively lead us astray, there may be a good lesson to be found.
Solution 3:
As you suggest in your own question, there is in fact no contradiction in Leibniz's notation, contrary to persistent popular belief. Of course, one needs to distinguish carefully between partial derivatives and derivatives in the notation, as you did. On an even more basic level, the famous "inconsistency" of working your way from $y=x^2$ to $dy=2xdx$ is handled successfully by Leibniz who is aware of the fact that he is working with a generalized notion of "equality up to" rather than equality "on the nose". These issues were studied in detail in this recent study.
The formula $\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$ holds so long as we assign to the independent variable $du$ in the denominator of $\frac{dy}{du}$ the same value as that given by the dependent variable $du$ in the numerator of $\frac{du}{dx}$. On the other hand, if as is usual one uses constant differentials $du$ in computing $\frac{dy}{du}$ the formula will be incorrect. In each instance one has to be careful about the meaning one assigns to the variables, as elsewhere in mathematics. For details see Keisler.
The OP reformulated his question in subsequent comments as wishing to understand how Leibniz himself viewed his theory and why he believed it works. This seems like a tall task but it so happens that there is a satisfactory answer to it in the literature. Namely, while Leibniz was obviously unfamiliar with the ontological set-theoretic material we take for granted today, he had a rather clear vision of the procedural aspects of his calculus, and moreover clearly articulated them unbeknownst to many historians writing today. The particular paradox of the differential ratio $\frac{dy}{dx}$ being apparently not equal on the nose to what we expect, e.g., $2x$ (which in particular undermines the "tautological" proof of the chain rule in one variable) was explained by Leibniz in terms of his transcendental law of homogeneity. On Leibniz see article1 and article2.
The consistency of Leibniz's law is demonstated in the context of modern set-theoretic assumptions in terms of the standard part principle.
Solution 4:
Leibniz notation for the second derivative suggests a version of the chain rule: $$ \frac{d^2y}{du^2}\left(\frac{du}{dv}\right)^2=\frac{d^2y}{dv^2}. $$ This does not hold in general: for example $y=u=v^2$.
Solution 5:
I think the question is ill-posed (or, what amounts to the same thing, makes some incorrect assumptions).
Notation does not lead to contradictions. Ever. In any discipline. That is because notation does not assert anything. Notation has no truth value. Notation consists of a set of symbols for recording things, and a set of rules for manipulating those symbols. The power of Leibniz's notations is precisely that when those rules are properly followed we end up with formulas that look like familiar fraction cancellation laws, etc., which makes them easier to remember and conceptually easier to understand. If Leibniz's notation is misused, then, yes, apparent contradictions can arise -- but that is not a flaw in the notation, but rather a flaw in those who misuse the notation.
Looking over all of the answers in this thread, including the example in the OP, you will find the same kind of dialectic over and over: "Some people write < formula >, which looks like a contradiction or inconsistency, but that is only because < formula > really means < other formula >." Yes, precisely. If notation is used wrong you get wrong answers; if you use notation correctly, you get correct answers. If you get a result that you know is false, you can be sure that the notation has been misused.
Now what people generally mean, I think, when they critique Leibniz's notation as "leading to contradictions" is that certain mis-uses of the notation are very tempting, and people are prone to making them. This may be true, although I would counter that other notations (primes, dots, what have you) also have their "attractive nuisances". But that is a psychological problem, having to do with the human tendency to look for shortcuts and to perceive seemingly apparent patterns that are not really there; the fault lies not with our d's but with ourselves.