There are several perfectly rigorous ways to formalize this kind of reasoning, none of which require any nonstandard analysis (which you should be quite suspicious of as it relies on a weak choice principle to even get off the ground).

One of them is, as Robert Israel says, interpreting statements about infinitesimals as statements about limiting behavior as some parameter tends to zero. For example, you can define what it means for a function $f(x)$ to be differentiable at a point: it means there is some real number $f'(x)$ such that (in little-o notation)

$$f(x + \epsilon) = f(x) + f'(x) \epsilon + o(|\epsilon|)$$

as $\epsilon \to 0$. After you prove some basic lemmas about how little-o notation works, you get some very clean and intuitive proofs of basic facts in calculus this way. For example, here's the product rule:

$$\begin{eqnarray*} f(x + \epsilon) g(x + \epsilon) &=& \left( f(x) + f'(x) \epsilon + o(|\epsilon|) \right) \left( g(x) + g'(x) \epsilon + o(|\epsilon|) \right) \\ &=& f(x) g(x) + (f'(x) g(x) + f(x) g'(x)) \epsilon + o(|\epsilon|). \end{eqnarray*}$$

After writing down a bunch of arguments like this, if you're familiar with elementary ring theory it becomes very tempting to think of expressions that are $o(|\epsilon|)$ (meaning they grow more slowly than $|\epsilon|$ as $\epsilon \to 0$) as an ideal that you can quotient out by, and this intuition can also be formalized.

More precisely, in the ring $R = C^{\infty}(\mathbb{R})$ of smooth functions on $\mathbb{R}$, for any $r \in \mathbb{R}$ there's an ideal $(x - r)$ generated by the function $x$, consisting of all functions vanishing at $r$. Working in the quotient ring $R/(x - r)$ amounts to only working with the value at $r$ of a function. Working in the quotient ring $R/(x - r)^2$, though, amounts to working with both the value at $r$ and the first derivative at $r$, with multiplication given by the product rule. Similarly, working in $R/(x - r)^{n+1}$ amounts to working with the value at $r$ and the first $n$ derivatives at $r$.

Taking ideas like this seriously leads to things like formal power series, germs of functions, stalks of sheaves, jet bundles, etc. etc. It is all perfectly rigorous mathematics, and nonstandard analysis is a huge distraction from the real issues.


One way of thinking about this is using a parameter $\epsilon$ as $\epsilon \to 0$. If $dx = O(\epsilon)$ and $dy = O(\epsilon)$ while $x$ and $y$ do not depend on $\epsilon$, then $dx\; dy = O(\epsilon^2)$, so it's correct to say $$ (x + dx)(y + dy) = xy + x\; dy + y\; dx + O(\epsilon^2)$$
And this can be manipulated further, perfectly rigourously, using the standard rules of Big O notation


In nonstandard analysis one can define derivatives without using limits: if $dx$ is an infinitesimal, that is, a number greater than zero but less than every positive real number, then $f'(x)$ can almost be computed as $[f(x+dx)-f(x)]/dx$. To get the same result as in standard analysis, one then takes the "standard part" of this, the closest real number, which amounts to the throwing away of higher-order infinitesimals that your physics professor did.

Here are two explicit examples. Let's compute the derivative of $f(x)=x^2$. Let $dx$ be infinitesimal. Then $f(x+dx)-f(x)=x^2+2xdx+(dx)^2-x^2=2xdx+(dx)^2$. Dividing by $dx$ we get $2x+dx$. For $x$ a real number it's hopefully intuitive that the standard part of $2x+dx$ is $2x$, and so we get our familiar identity $f'(x)=2x$.

Now let's look at the product rule, which is the sort of situation in which your professor's argument might come up. We have $$(fg)'(x)dx\approx fg(x+dx)-fg(x)=$$$$[f(x)+f'(x)dx+c_1(dx)^2][g(x)+g'(x)dx+c_2(dx)^2]-fg(x)=(f'g+g'f)dx+c_3(dx)^2$$ Here we're using Taylor's theorem to expand $f$ and $g$-in the familiar context we say the $c_i$ don't go to infinity as $dx\to 0$, which in the nonstandard context is just to say the $c_i$ are not infinite for infinitesimal $dx$.

So here the $(dx)^2$ term will disappear, as your professor suggested, when we take the standard part of the derivative. But this only makes sense after we've subtracted $fg(x)$! Then we're justified in cutting off at the standard, or real, part of our expression-saying $(x+dx)(y+dy)=xy+ydx+xdx$ is rather arbitrary, in comparison.

Anyway, this discussion requires justifying the existence of infinitesimals, and our ability to compute with them as we do with reals, even applying Taylor's theorem to them. The full justification of this theory involves understanding a couple of logical topics: first-order predicate logic and ultraproducts. These aren't overwhelmingly technical, but have little to do with how the theory is used. For that, it's enough to know the

Transfer Principle All the same things are true of the extended reals with infinitesimals as of the standard reals that can be stated without saying "For every subset of $\mathbb{R}$..." or something equivalent.

(With apologies for the lack of precision in this statement-I hope it gets the point across.) Being careful with the transfer principle is probably where nonstandard analysis wins out over informal physical reasoning, that is, it lets us decide exactly when this sort of argument is reasonable. Specific examples are that the nonstandard reals and differentiable functions on them do satisfy the intermediate value theorem and Taylor's theorem but do not satisfy the least upper bound property.


$$ \frac{\Big(x + x\,\Delta y + y\,\Delta x + \Delta y\,\Delta x\Big) - x }{\Delta t} = \underbrace{x\frac{\Delta y}{\Delta t} + y \frac{\Delta x}{\Delta t}}_A + \underbrace{\frac{\Delta y\,\Delta x}{\Delta t}}_B $$ $$ \overbrace{\frac{\Delta y\,\Delta x}{\Delta t} = \frac{\Delta y}{\Delta t}\Delta x = \frac{\Delta x}{\Delta t}\Delta y}^B $$ The expression labeled $B$ approaches $0$ since $\dfrac{\Delta y}{\Delta t}$ approaches a finite number and it is then multiplied by $\Delta x$, which approaches $0$. And similarly for the last term above.