Defining the Derivative using Internal Set Theory (Non Standard Analysis)

As an engineer, I have found NSA (Non Standard Analysis) to be much closer to our intuition than traditional calculus. Since there are basically two approaches to NSA, one using Hyperreals and another using Internal Set Theory (IST) I decided to take a look at both approaches. With Hyperreals, we can say that a function $f: \mathbb{R} \rightarrow \mathbb{R}$ has in a real point $a$ the derivative $m$ if for all hyperreal infinitesimal $\Delta x$

$ st(\frac{f(a + \Delta x) - f(a)}{\Delta x}) = m $.

I'm actually reading Alain Roberts's book, which employs the IST approach, and it doesn't actually define the derivative of $f$ by means of NSA alone. What it does is to keep the classical definition of differentiability (i.e: using limits) and to define the new concept of S-differentiability by saying that $f$ is S-differentiable in a point $a$ if there is a standard $m$ with

$\frac{f(x) - f(a)}{x - a} \approx m$ for all $x \approx a$.

The problem with this definition is that it's only equivalent to differentiability when $f$ and $a$ are both standard. Although differentiability could be implicitly characterized by S-differentiability by defining that $f$ is differentiable in $a$ if $(f,a)$ belongs to $\{(f,a): f\text{ S-differentiable at $a$}\}^{S}$ as the book suggests, it's much more artificial and away from our intuition than the definition using hyperreals or even the classical definition itself. Futhermore, it only defines the relation of differentiability, and not the operation of taking a derivative. My question is then the following: How can we define the derivative (or differentiability) in IST in a way that preservers intuition and doesn't fall back to classical definitions ?


On the contrary, there is no actual difference between the definition of differentiability using hyperreal numbers, and the IST definition of S-differentiability.

Before I explain why, let me restate the NSA definition of differentiability in a more precise manner: the version you state employs some abuses of notation which makes the correspondence harder to understand. I will call it H-differentiability, to distinguish it from the classical (limit-based) definition of differentiability.

Definition. Consider a real function $f: \mathbb{R} \rightarrow \mathbb{R}$, and a real number $a \in \mathbb{R}$. We say that $f$ is H-differentiable at the point $a$ if we can find a real number $m \in \mathbb{R}$ so that for the unique hyperreal-valued $\star$-extension $~^\star\! f: ~^\star\!\mathbb{R} \rightarrow ~^\star\!\mathbb{R}$ and for any hyper-real infinitesimal $\Delta x \in ~^\star\!\mathbb{R}$,

  • the standard part of the hyperreal $\frac{~^\star\!f(a + \Delta x) - ~^\star\!f(a)}{\Delta x}$ exists, and is equal to $m$.

Defined.

Notice that the existence condition is required, otherwise the $\mathrm{st}(-)$ notation is not even defined. Moreover, you actually have to talk about the extension $~^\star\!f$, since $f$ is a function of a real variable, so the notation $f(x + \Delta x)$ is undefined as well.

Moreover, in NSA, the standard part of a hyperreal $a \in ~^\star\! \mathbb{R}$ is simply the unique real number $a' \in \mathbb{R}$ so that $a - a'$ is infinitesimal, provided that such an $a'$ exists. In IST, we don't deal with hyperreals; however, we define the standard part of a real number $a \in \mathbb{R}$ as the unique standard real number $a' \in \mathbb{R}$ such that $a - a'$ is infinitesimal, provided that such an $a'$ exists. In both settings, we have to prove that as long as our numbers are not "too large", they have standard parts.

Theorem (NSA). Consider a hyperreal $a \in ~^\star\! \mathbb{R}$ so that $|a| \leq b$ for some real number $b \in \mathbb{R}$. Then $a$ has a standard part.

Theorem (IST). Consider a real number $a \in \mathbb{R}$ so that $|a| \leq b$ for some standard real number $b \in \mathbb{R}$. Then $a$ has a standard part, i.e. a unique standard real $a'$ so that $a - a'$ is infinitesimal.

This allows us to phrase S-differentiability in an identical way to H-differentiability:

Definition. Consider a standard real function $f: \mathbb{R} \rightarrow \mathbb{R}$, and a standard real number $a \in \mathbb{R}$. We say that $f$ is S-differentiable at the point $a$ if we can find a standard real number $m \in \mathbb{R}$ so that for any real infinitesimal $\Delta x \in \mathbb{R}$,

  • the standard part of the real number $\frac{f(a + \Delta x) - f(a)}{\Delta x}$ exists, and is equal to $m$.

Defined.

Notice how the hyperreals of H-differentiability become ordinary reals in S-differentiability, while the reals and (non-hyper)real-valued functions of H-differentiability become standard reals and standard real-valued functions. The $\star$-extensions, which we suppressed in countless places in the NSA definitions (e.g. strictly speaking where I write $-$, I should be writing the $\star$-extension of real-valued subtraction to the hyperreals), disappear altogether.

In fact, from the outside view, a "standard function $f: \mathbb{R} \rightarrow \mathbb{R}$" in IST is the exact same thing as an "arbitrary function $f: \mathbb{R} \rightarrow \mathbb{R}$" in NSA. The behavior of an "arbitrary function $f: \mathbb{R} \rightarrow \mathbb{R}$" in IST is similar to (although not identical to) that of a "hyperreal function $f: ~^\star\!\mathbb{R} \rightarrow ~^\star\!\mathbb{R}$" in NSA. With this in mind, the relationships between H-differentiability, S-differentiability and classical differentiability should become clearer:

  1. The IST definition of S-differentiability is equivalent to the classical definition of differentiability only if $f$ is a standard real-valued function and $a$ is a standard real number.

  2. The NSA definition of H-differentiability is equivalent to the classical definition of differentiability only if $f$ is a real-valued function, and $a$ is a real number.

Moreover, neither of the definitions has more "operational" content than the other: NSA just seemed more operational due to cleverly chosen notation.

In IST, you can characterize classical differentiability for arbitrary real-valued functions, by using S-differentiability and the Standardization operation. In NSA, to characterize classical differentiability for real-valued functions, you would use H-differentiability: but that gets you only as far as the standard real-valued functions of IST, and it wouldn't help you talk about differentiability for hyperreal-valued functions at all.

Now, I can answer your question.

How can we define the derivative (or differentiability) in IST in a way that preservers intuition and doesn't fall back to classical definitions?

In the exact way that Alain Roberts's book suggests. We say that $f: \mathbb{R} \rightarrow \mathbb{R}$ is differentiable at $a$ if $(f,a)$ belongs to the set

  • $\{(f,a) \in \mathbb{R}^\mathbb{R} \times \mathbb{R} \:|\: \exists^{st} m \in \mathbb{R}. \forall \delta \in \mathbb{R}. \delta \approx 0 \rightarrow \frac{f(a + \delta) - f(a)}{\delta} \approx m \}^{S}$.

Is this definition more "artificial" or "away from our intuition" than the definition using hyperreals?

It makes no sense to criticize this definition as any more "artificial" or "away from our intuition" than the definition using hyperreals. In fact, the hyperreal definition looks nigh-identical, and says that $f: \mathbb{R} \rightarrow \mathbb{R}$ is differentiable at $a$ if $(f,a)$ belongs to the set

  • $\{(f,a) \in \mathbb{R}^\mathbb{R} \times \mathbb{R} \:|\: \exists m \in \mathbb{R}. \forall \delta \in ~^\star\!\mathbb{R}. \delta \approx 0 \rightarrow \frac{~^\star\!f(a + \delta) - ~^\star\!f(a)}{\delta} \approx m \}$.

Keep in mind that if you write $\delta \approx 0 \rightarrow \frac{f(a + \delta) - f(a)}{\delta} \approx m$ on a blackboard, most mathematicians who have never learned any form of nonstandard analysis will still know that you're talking about differentiability of $f$ at $a$. Hardly something one could call away from our intuition!

Won't I need to fall back to the classical definitions to prove stuff?

You never ever need to make use of (fall back to) or even state the classical definitions. Once you define S-continuity, you can simply define continuity using Standardization, and never give any equivalent, $\varepsilon-\delta$ classical characterization. The same goes for differentiability: you can explain what S-differentiability means, and then define differentiability as the standardized notion.

Say you want to prove something about the relationship between these two. E.g. you want to show that every differentiable function is continuous. You can argue as follows.

By Transfer, it suffices to prove that every standard differentiable function is continuous. But a standard function $f$ is continuous precisely if it satisfies S-continuity: remember, you defined the set of continuous functions by Standardization, as the unique standard set of functions whose standard elements are S-continuous. The same goes for differentiabiliy: a standard $f$ is differentiable precisely if $f$ satisfies S-differentiability. So it suffices to prove that if a standard $f$ is S-differentiable, then it is S-continuous.

Following this template, you will only ever use the S-definitions, and never have to make use of classical characterizations.

Isn't the set formed by Standardization "away from our intuition"?

You may regard Standardization as somehow more mysterious than the other two principles of IST. If so, keep in mind that the $\star$-extension of NSA is equally mysterious. Both of them require new intuitions. There are infinitely many ways to extend a function $f: \mathbb{R} \rightarrow \mathbb{R}$ to the hyperreals. Can you explain, succinctly, what makes the $\star$-extension special among these? In fact, Standardization and $\star$-extension are very closely related metamathematically, and it's hard to capture their meaning without explaining a construction of some particular model of IST/NSA. Personally, I think that IST Standardization fares better than its NSA alternative in this regard: most people who study IST eventually develop a good intuition for it, without having to ever go through a model construction.