Failure of differential notation

Through the informal use of differentials, the product rule can be "proved" by writing $$d(fg) = (f + df)(g + dg) - fg = df\,g + f\,dg + df\,dg.$$ Neglecting the product of two differentials, we conclude that $$d(fg) = df\,g + f\,dg.$$ However, the accepted answer to this question mentions that manipulations like this are not always justified. In particular, he points out that it is unclear why we should not neglect a single differential (itself an "infinitesimal" quantity), but we should neglect their product (presumably since it's "infintesimal-er").

Can someone produce an example in which a line of reasoning similar to the above argument for the product rule leads to a false conclusion (preferably from single-variable calculus)? Another way to phrase the question is this: What failures of the informal use of differentials led to the development of non-standard analysis?


The answer to your question really depends on the formalism with which you develop a rigorous treatment of infinitesimal numbers. In Robinson nonstandard analysis and related formalisms, the notion of standard part fixes everything. For example, you have the following proof of the product rule in this setting:

\begin{eqnarray*} (fg)'(x) & \equiv & st((f(x+dx)g(x+dx)-f(x)g(x))/dx) \\ & = & st(((f(x)+f'(x)dx)(g(x)+g'(x)dx)-(f(x)g(x))/dx) \\ & = & st((f'(x)g(x)dx+g'(x)f(x)dx+g'(x)f'(x)dx^2)/dx)\\ & = & st(f'(x)g(x)+g'(x)f(x)+g'(x)f'(x)dx) \\ & = & f'(x)g(x)+g'(x)f(x) \end{eqnarray*}

The intuition here is that the terms with only a single $dx$ are of the same order as the change in $x$, so they are relevant to the first order behavior. The terms with two $dx$ are of a higher order as the change in $x$, so they are not relevant to the first order behavior (they are relevant to the second order behavior).

On the other hand, in smooth infinitesimal analysis and related formalisms, there is a collection of numbers which are by definition nilsquare (along with a collection which is nilcube etc.) In that setting there is no $st$ on the outside of everything, but instead the term $g'(x)f'(x)dx^2$ is exactly zero by definition, and everything else goes through the same.


The differentials in the nonstandard version of single variable calculus are essentially the same as in the standard version: e.g. the definition of $df(x)$ is that it is equal to $f'(x) dx$.

The real benefit of nonstandard analysis is its treatment of infinitesimals, which behave (internally) exactly like all other numbers, but still give (external) meaning to the notion of an "infinitesimal displacement" that isn't limited to simply being dual to the notion of differential, and thus limited to only being compatible with the study of the first-order behavior of differentiable functions.

(as an aside, there are ways in standard analysis to study the higher-order behavior of differentiable functions: I think that's what jet bundles are all about, although I don't really understand them)


We should first compare the alleged proof given in the question to how it's meant to be done. Starting with Taylor's formula: $$f(x + ε) = f(x) + εf'(x)$$ which is true by definition for all secants of curves generated by smooth functions. It is also true for 'very small' values of ε. What can it tell us about the product rule? Well, the product of two smooth functions is a smooth function, so: $$f.g(x + ε) = f.g(x) + ε(f.g)'(x)$$ $$(f + εf')(g + εg') = f.g(x) + ε(f.g)'(x)$$ $$f.g + εf.g' + εg.f' + ε^2 f'.g' = f.g(x) + ε(f.g)'(x)$$ $$εf.g' + εg.f' + ε^2 f'.g' = ε(f.g)'(x)$$ The question is what to do next? If we are interested in secants we can divide throughout by ε and leave it at that. If we are interested in the tangent and subscribe to non-standard analysis we can divide by ε and then neglect the $εf'.g'$ term, which is called 'taking the standard part' and can be justified by supposing that ε can in theory be made as small as we like so the 'error' (in purely theoretical work) can be made to be less than any 'resolution'; or to put it another way ε is arbitrarily small. If we are interested in the tangent and subscribe to smooth infinitesimal analysis we can neglect the $ε^2f'.g'$ term and then divide throughout by ε in order to obtain the familiar product rule. But, what is the justification for (in effect) equating $ε^2$ to zero? Can it really be said to be 'infinitesimal-er' than ε itself, or is it just a matter of convenience - we intend to neglect that term anyway so (since it serves no purpose) why not do it sooner rather than later and save on the ink? In one interesting respect $ε^2$ is 'infinitesimal-er' than ε itself even if the ultimate motivation is convenience. First note that we justify neglecting such terms because (again, in purely theoretical work) we can reduce their size below arbitrary 'resolutions'. What the assertion $ε^2 = 0$ means is that regardless of what $ε^2$ is multiplied by we can, by reducing the value of ε, get it below the value of the ε term. This can be tested with the following program:

def inrdx(cyc, co2, co3, co4):
    inval = 0.9
    for i in range(cyc):
        val2 = co2*inval**2 + co3*inval**3 + co4*inval**4
        print(inval, val2)
        if inval > val2:
            return
        inval /= 2

inrdx(100,15000,10000,5000)

This seems to work regardless of how high you set the coefficients ('co2' etc). Is there any logical justification for it? First set $ε = 1/k$ and note that $ε < 1$ so $k > 1$ (ε wouldn't be an infinitesimal otherwise, in the mathematical sense of the word) then: $$ε^2 = \frac{1}{k}\frac{1}{k}$$ In calculus, full differentiation yields the full derivative - which when simplified (for a polynomial) can be represented thus: $$\frac{1}{k} + \frac{1}{k}\frac{h}{k}$$ where h is a finite number for a given x (the numerator in the first term is kept at one for the sake of simplicity). Now suppose $h < k$, then the second term is less than the first and so can perhaps be neglected. However, if $h > k$ it cannot be if we wish to retain any semblance of integrity. But, if we allow ourselves the freedom to reduce ε (that is, to increase $k$) we can obviously get the second term below the first, and that difference can be increased in the same way to get the $ε^2$ term below any 'resolution'. All of the more complex expressions of this type derived from the correct starting equation can be dealt with in this way. Infinitesimals are nilpotent because if they arise from Taylor's formula their higher powers really are infinitesimal-er than their first powers (see here for more detail). This explains the problem in the OP's set up. It's important to disallow proofs like that (and here) because they were the original cause of the controversy about the foundations of calculus. To conclude, it's worth noting that the product rule can be proven in a more geometrical fashion by 'circumventing the indeterminate form' as here: $$\frac{f(a)g(a)−f(b)g(b)}{a−b} = g(b)⋅\frac{f(a)−f(b)}{a−b} + f(a)⋅\frac{g(a)−g(b)}{a−b}$$ but some aspects of calculus do require infinitesimals, or 'indefinitely small variables' if you prefer, so it's good to have a consistent method for dealing with them.

Edit: the technique of increasing the denominator has also been used on its own without reference to nilpotent infinitesimals, such as here on page 5 in Euler's proof of the product formula for sin(x).