Of course, defining $$ \mathrm{d}x= \lim_{\Delta x \to 0}\Delta x $$ is the same as defining $$ dx=0, $$ which makes no sense. The correct approach is to define the differential as a kind of linear function: the differential $df(x)$ (sometimes denoted by $df_x$) is the linear function defined by $$ df(x):\mathbb R\to\mathbb R\qquad t\mapsto f'(x)\cdot t $$ In particular $$ dx:\mathbb R\to\mathbb R\qquad t\mapsto t $$ Therefore, one can also write $ df(x)=f'(x)dx$ (the composition with the identity map). This sounds perhaps trivial for scalar funtions $f$. The concept is more interesting for vector functions of vector variables: in that case $df(x)$ is a matrix. The differential $df(x_0)$ has to be interpreted as the best linear function which approximates the incremental function $h(x):=f(x)-f(x_0)$ near $x=x_0$. In this sense, the concept is connected to the idea you have expressed through the approximate 'equation' $\Delta f(x)\approx {f}'(x)\Delta x$


There are two ways of defining the differential of $y=f(x)$:

(1) as differential forms. Here $dx$ is a linear function on the tangent space (in this case tangent line) at a point, and the formula $dy=f'(x)dx$ is a relation between 1-forms.

(2) as an infinitesimal number. Such a number is an element of the hyperreal number system, as detailed in the excellent textbook by H. J. Keisler entitled Elementary Calculus that we are currently using to teach calculus to 150 freshmen.

Here the independent variable $\Delta x$ is an infinitesimal, one defines $f'(x)=\textbf{st}(\frac{\Delta y}{\Delta x})$ where "$\textbf{st}$" is the standard part function (or shadow) and $\Delta y$ is the dependent variable (also infinitesimal when the derivative exists). One defines a new dependent variable $dy$ by setting $dy=f'(x)dx$ where $dx=\Delta x$. Note that it is only for the independent variable $x$ that we set $dx=\Delta x$ (therefore there is no circularity).

The advantage of this is that one can calculate the derivative $\frac{dy}{dx}$ from the ratio of infinitesimals $\frac{\Delta y}{\Delta x}$, rather than merely an approximation; the proof of the chain rule becomes more intuitive; etc.

More generally if $z=f(x,y)$ then the formula $dz=\frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y}dy$ has two interpretations: as a relation among differential 1-forms, or as a relation among infinitesimal differentials. Classical authors like Riemann interpreted such relations as a relation among infinitesimal differentials.

It is not possible to define $dx$ by a limit as in $\mathrm{d}x= \lim_{\Delta x \to 0}\Delta x$ (as you wrote) because that would simply be zero, but a generalisation of limit called ultralimit, as popularized by Terry Tao, works just fine and produces an infinitesimal value for $dx$.

More specifically, concerning your hope of somehow "defining differentials with the help of limits", the following can be said. The notion of limit can be refined to the notion of an ultralimit by refining the equivalence relation involved in defining the limit. Thus the limit of a sequence $(u_n)$ works in such a way that if $(u_n)$ tends to zero then the limit is necessarily zero on the nose. This does not leave much room for infinitesimals. However, the refined notion, the ultralimit, of a sequence $(u_n)$ tending to zero is typically a nonzero infinitesimal, say $dx$. We can then use this as the starting point for all the definitions in the calculus, including continuity and derivative. The formula $dy= f'(x) dx$ then literally makes sense for nonzero differentials $dx$ and $dy$ (unless of course $f'(x)=0$ in which case $dy=0$).

The definition is not circular because the infinitesimal $\Delta y$ is defined as the $y$-increment $f(x+\Delta x)-f(x)$. This was essentially Leibniz's approach (differentials are just infinitesimals) and he rarely did things that were circular.


We consider a real valued function $y=f(x)$ differentiable at $x=x_0$.

The following reasoning can be found in section 3.7 of Höhere Mathematik, Differentialrechnung und Integralrechnung by Hans J. Dirschmid.

Definition: We call the change of the linear part of $f$ at $x=x_0$ considered as function of the argument increment $\Delta x$ the differential of the function $f$ at $x_0$, symbolically \begin{align*} dy=f^\prime(x_0)\Delta x\tag{1} \end{align*} The linear part of $f$ at $x_0$ is the expression \begin{align*} f(x_0)+f^\prime(x_0)\Delta x \end{align*}

Note that we introduce the term $dy$ in (1) without using $dx$ and so avoid any circular reasoning.

Here is a small figure for illustration:

                                        enter image description here

When talking about the differential $dy$ we use it for both as a function symbol and as the value of the function $dy$ evaluated at $\Delta x$. \begin{align*} dy=dy(\Delta x)=f^\prime(x_0)\Delta x\tag{2} \end{align*}

$$ $$

Connection with $dx$:

We consider the identity function $y=x$. Since $y^\prime=1$ we obtain by (2) \begin{align*} dy=1\cdot \Delta x=\Delta x \end{align*} Since $y=x$ and $dy=\Delta x$ we use this relationship to define \begin{align*} dx:=\Delta x \end{align*} and call it the differential of $x$.

With this two step approch we can write $dy=f^\prime(x_0)\Delta x$ as \begin{align*} dy=f^\prime (x_0) dx\tag{3} \end{align*} and resolve the seemingly circular definition.

[Add-on 2016-11-15]:

From (3) we see the differentials $dy$ and $dx$ are proportional as functions of $\Delta x$. Since we are allowed to divide real functions, we can also consider the quotient \begin{align*} \frac{dy}{dx}=f^\prime(x_0)\tag{4} \end{align*} This justifies the term differential quotient.

Observe the left-hand side of (4) is the quotient of two functions dependent on the argument increase $\Delta x$ which does not occur on the right-hand side. This implies that the quotient does not depend on the argument $\Delta x$ of the numerator $dy$ and the denominator $dx$.

$$ $$

Approximation of $f$ at $x=x_0$:

The linear part $$f(x_0)+f^\prime(x_0)\Delta x$$ approximates the function $f$ at $x=x_0$ with an error which decreases with an order higher than first order. This implies the change of the linear part - the differential $dy$ - approximates the change of the function, which is the difference $\Delta y=f(x+\Delta x)-f(x)$ also with this error quality: \begin{align*} \Delta y=dy+\Delta x \varepsilon(\Delta x),\qquad \lim_{\Delta x\rightarrow 0}\varepsilon(\Delta x)=0. \end{align*}