Justification of algebraic manipulation of infinitesimals

I will just throw a few buzzwords at you :-)

The mathematically precise concept of "infinitesimal" is called "differential form". If we fix the Euclidean plane $\mathbb{R}^2$ and think of it as a "differentiable manifold", then every point in the plane has a tangential space that is "isomorph to" (or another copy of) $\mathbb{R}^2$. If we further fix a cartesian coordinate system with coordinates x and y, then a differential form is a gadget that assigns for every point p with coordinates $(x_p, y_p)$ to a tangent vector at that point a real number.

If you think of attaching a vector pointing upwards with the length of on, (0, 1), at every point in the plane, then in our example dx would spit out 0 at every point and dy would spit out 1 at every point.

This is the starting point for modern abstract "coordinate free" differential geometry.

From my experience, these concepts are usually not easy to understand for beginners, so don't worry if you don't understand everything on a first reading.

First note that you don't need the concept of a "differential form" to understand your first example:

Take a rectangle with side lengths a and b, then we have a function that gives the area, $$ f: \mathbb{R}^2 \to \mathbb{R} $$ $$ f: (a, b) \mapsto ab $$ If you increase both of the coordinates by h, then this is just a directional derivative, and since $f$ is differentiable we know that $$ f(a + h, b + h) - f(a, b) = df*(h, h) + o(h^2) $$ holds. Here "dx" is just short hand notation for both "h" and "we know that f is differentiable and therefore that the remainder of the right hand side is $o(h^2)$, that is for smaller and smaller h the linear approximation gets better and better". The linear approximation is by definition given by applying the differential $df$ of $f$ to the vector $(h, h)$.

The second example is a little bit more complicated, here we'll really need the concept of "differential forms". To be mathematically precise, we'd have to write $$ ds^2 := dx \otimes dx + dy \otimes dy $$ That is, the left hand side is defined by the right hand side, and the right hand side consists for a fixed point of elements of the tensor product of the cotangential space $T^*_pM$ with itself.

This means this gadget eats two tangential vectors on any tangent space and spits out a real number, and this operation is bilinear (linear in both input variables). So, if you fix a point on the plane, you get an element of the space $$ T^*_pM \otimes T^*_pM $$ which has an algebraic structure that can be used.

If you would like to learn more about this, I'd recommend any textbook on differential geometry or differentiable manifolds.


You will hear a lot about differentials, exterior algebra and maybe even about nonstandard analysis. But the sad fact is: In over 300 years of calculus we have not come up with an easy answer to your question. All one can say is: If in a particular case "all of this manipulation" leads to a correct result, then there is also an "analytically rigorous justification" for it.

If done with professional expertise, dealing with "differentials" of all sorts in a lighthanded way is definitely a successful heuristic technique, especially in an environment where one doesn't care so much about $\epsilon$'s and $\delta$'s. But some care is needed: When you argue about the area under a curve $\gamma\!: y=f(x)$ you don't have to bother about the increase of $f$ in an interval of length $dx$, but if you want to derive a formula for the length of $\gamma$, then this little increase of $y$ plays a decisive rôle.


There are various ways of answering your question. Tim van Beek has given one in terms of differential forms. Another nice way of looking at it is using nonstandard analysis (NSA). The nice thing about NSA, to my taste, is that it allows you say things the way that Leibniz and Gauss and Euler found it so useful to say them (i.e., with dx's and dy's), and it also allows you to be certain that what you're doing is logically rigorous. NSA doesn't have to be hard and scary. Jerome Keisler wrote a very nice freshman calculus book using NSA, now available online for free: http://www.math.wisc.edu/~keisler/calc.html My own book, using a similar approach, is here: http://www.lightandmatter.com/calc/ This is also a very nice treatment: http://www.math.uiowa.edu/~stroyan/InfsmlCalculus/InfsmlCalc.htm

The basic idea of NSA is that just as we expanded the integers to the rationals, and the rationals to the reals, we go one more step and expand the reals to the hyperreals. The hyperreals are a number system that includes the reals, but that also includes infinitesimals. The way you know whether you're doing something logically correct is that if you write down all of the elementary axioms of the real number system (x+y=y+x, etc.), then all of those are true in the hyperreals. "Elementary" means axioms that only say things like "for every number...," not ones that say "for every set of numbers..."


This stems from simple theorem regarding the change of variables, also called substitution, used for integration: $\int {f\left( {g\left( t \right)} \right)g'\left( t \right)dt} = \int {f\left( y \right)dy} $. If we put $g\left( t \right) = y$ in $\int {f\left( {g\left( t \right)} \right)g'\left( t \right)dt} $, we have $\int {f\left( y \right)\frac{{dy}} {{dt}}dt} = \int {f\left( y \right)dy} $ which we know is true from the theorem. So, when we integrate, it's allowed for us to think that, when $y = g\left( t \right)$, then $dy = g'\left( t \right)dt$.

In your example, you have $y\left( x \right) = 4{x^2}$, and want to calculate $y'\left( x \right)$. We have $y'\left( x \right) = \frac{{dy}}{{dx}} = 8x$ (you forgot to multiply by $x$ in your example). Now, we use abuse of notation to write $dy = 8xdx$.

Similar arguments can be made with integration of multivariate functions. Such consideration gives rise to differential forms.