What does $dx$ mean without $dy$?

I understand that $dy/dx$ represents how $y$ changes as $x$ changes. But what does $dx$ mean in isolation? I have been told it means an infinitely small change in $x$ without $dx$ being zero. I would like a more rigorous definition.


In the perview of so-called "standard analysis", $dx$ is just a notation. $dy/dx$ is just a notation for the derivative of a function $y = y(x)$ and $\int f(x) \, dx$ for an anti-derivative of $f$. It can often be intuitively useful to think of $dx$ as an "infinitesimal change in $x$" but this is just an informal intuition. (Which most often leads to the correct answer, but can lead you to trouble: see e.g. this and this and the links therein.)

There is a field of so-called nonstandard analysis which seeks to make the concept of an infinitesimal length mathematically precise and then proceeds to use this to rigorously define calculus. To do non-standard analysis rigorously is very non-trivial however. (It was only developed in the 60s.)

There is a formal definition of differentials like $dx$ in the theory of differential forms. The theory of differential forms is in some way the "correct" way of doing multivariable calculus. It's hard to give an elementary characterization of what $df$ is in this approach without some background, but the following summary should be a good introduction to the subject.


You can think of an equation of the form

$$ df(x) = f'(x) dx $$

as saying that no matter what you put in the box:

$$ \frac{df(x)}{d \square} = f'(x) \frac{dx}{d \square} $$

you get a true equation.

The technical term for this is $df$ is a "smooth section of the cotangent bundle." Let's break that down:

First the tangent bundle of $\mathbf{R}$ assigns to every point in $\mathbf{R}$ a set of directions emanating out of that point. Since $\mathbf{R}$ is one-dimensional, these vectors have a single dimension (their length with a sign of $\pm 1$). To each point $p \in \mathbf{R}$ we have tangent vectors which take the form

$$ (p, v) $$

where $v$ is a real number (treated as a vector). For example $(2, -1/2)$ is the tangent vector which starts at the point $2$ and points with length $1/2$ in the negative direction.

Cotangent bundle means that $df = df$ is an operation we apply to tangent vectors. Specifically, to each tangent vector $(p, v)$ the operation is

$$ (df)(p,v) = v\left.\frac{df}{dx}\right|_p. $$

This is the derivative of $f$ with respect to $x$ at the point $p$ multiplied by $v$. This is why a common notation for the tangent vector $(p,v)$ is

$$ v\left.\frac{d}{dx} \right|_p. $$

This starts to make more sense when you have more than one variable. For example suppose you have two variables $x$ and $y$. Then each tangent vector looks like $$u \left.\frac{\partial}{\partial x} \right|_p + v \left.\frac{\partial}{\partial y} \right|_p$$ which is the vector pointing with length $u$ in the $x$ direction and $v$ in the $y$ direction. This is how we make the "filling in the box" analogy rigorous. We are saying that we can fill in the box with $x$ or $y$ and obtain a true equation: $$ \frac{\partial f(x,y)}{\partial \square} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial \square} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial \square} $$

The words smooth section mean that we are considering what happens if we change $p$. For example we have

$$ (dx^2)(p,v) = 2pv $$

and this makes sense for more than one value of $p$. In fact, it makes sense for all $p$. Moreover, the function $2pv$ is a smooth function of $p$. This is what the word "smooth" refers to.

It is important to point out that $(df)(p,v)$ depends only on what $f$ is doing "near" $p$. That is to say, the derivative of $f$ at $p$ does not change if we change $f$ far away from $p$ but keep it the same at $p$. This is what gives $df$ its "infinitesimal" nature: the meaning of "near" can be arbitrarily small. We can determine the value of $(df)(p,v)$ knowing only the value of $f$ on the interval $p - 1/10 < x < p + 1/10$ or on the interval $p - 1/1000 < x < p + 1/1000$ or even smaller intervals.