Formalizing Those Readings of Leibniz Notation that Don't Appeal to Infinitesimals/Differentials

[disclaimer: I've studied a lot of logic but never been good at analysis, so that's the angle I'm coming from below]

in my attempt to find a precise version of the 'definitions' usually given when first introducing leibniz notation in single or multivariable calculus or analysis, wherein there is no appeal to differentials or infinitesimals, I've discovered that I'm confused about a few interellated low-level issues around formalization and notation, etc. I don't know which questions are the more basic ones here, so I'll just ask them as go along explaining what I think I do understand about proposed formal defintions of the sort in question.

I'm concerned with both the dy/dx notation and the 'del y/del x' notation for partial derivatives, but just the real-valued case.

Since I suspect that my confusions stem from use-mention confusions, and the confusion of functions, variables, and their values, I will use, and assume familiarity with lambda notation, metavariables, and quasi-quotation throughout. Where not stated, ⌜λx.φ⌝ refers to the function on the largest real domain on which φ is a real number. Assume only definitions for real variables are sought after below.

Anyway, on with the definitions:

These short articles by the author Thurston (the first five results) all give roughly the same formal definition of leibniz notation:

http://scholar.google.ca/scholar?hl=en&as_sdt=0,5&q=thurston+leibniz

whereas the top results for this search are by the author Harrison, and each give one of a few slight variants on a different definition:

http://www.google.ca/search?q=%22leibniz+notation%22+%22lambda+term%22&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a

here is my understanding of their definitions:

HARRISON:

⌜dφ/dψ⌝ is a shorthand for ⌜D(λψ.φ)(ψ)⌝ i.e. "dy/dx" stands for "D(λx.y)(x)"

This means that:

ψ must be a variable of the underlying logic (and it has a free and a bound occurence here) and

φ must be a string such that ⌜∀x, φ = f(x)⌝ is true for some function f:S→R where S⊆R.

Q1. does anyone have a simpler way to state the restriction on φ? Q1.1 what is the proper name for the sort of string φ must be?

THURSTON: ⌜dφ/dψ⌝ is a shorthand for ⌜ψ'/φ'⌝ i.e.

"dy/dx" is short for "y'/x'"

This means that φ and ψ must be names of functions from the reals to the reals.

Q1.2 Are there other formal definitions in the literature I should compare to these? I haven't found any yet...

Harrison's defintion does not return a value because a free variable is uninstntiated, in the same sense in which wffs which are not sentences do not return a truth value. Thurston's version, however returns a function.

For instance,

df/dx = f'(x) for harrison, df/dx = f' for thurston

More concretely

d(x²+x)/dx = 2x+1 for harrison λx.2x+1 for thurston

Q2 is the value '2x+1' which contains a free variable being returned, or are we implicitly quantifying over x, or, let's call it 'ξ' for the purposes of quasi quotation, and saying: ∀ξ, ⌜2ξ+1⌝ refers to (not 'is') the value returned?

However, consider the following 'typical' calculus problem:

" y = f(x) x = g(u) g(x) = x³ - 7

find df/dx "

Thurston gets us df/dx = λx.1/(3x²) This is undefined on harison's approach, since x is not a variable of the logic, but the name of a function.

Q3 should I be considering the possibility that the logic allows for variables ranging over functions?

And if we pretended that it was we'd still get df/dx = D(λx.f)(x) but this is mal-formed since λx needs something like 'f(x)' rather than 'f' as input.

And if we added a case to harrison's definition to append '(x)' or the like when it's missing, we'd still get df/dx = 1, which is not equal to the result we got with thurston's definition--but more strikingly, the function we evaluated to get 1 was, unike the f' vs f'(x) case above, not even the same function as was returned by thurston's definition.

Q4 What should I conclude from the fact that these definitions diverge in this manner?

Now consider:

" y = f(x) f(x) = x⁹

find dy/dx "

On harrison's account, we could view y as a metavariable, so that f(x) is placed substitutionally into the defining string, but I have a feeling that is not the right way to understand it. However, if y is merely a variable of the logic, it is a free variable in the result, and we end up with one free variable too many..

On thurston's account, y must be the name of a function, but "y = f(x)" sets y equal to an expression with a free variable, not equal to the name of a function

Q5 should i view statements like "y=f(x)" as involving a supressed "(x)" and "∀" so that we get "∀x,y(x)=f(x)" ? Or should I see y as a metavariable? Or, should I imagine the logic extended to allow some new syntactic category of 'dependent' variables, while thinking of the usual variables in the logic as 'independent' variables--i.e. those whose value does not depend on others? I think I am very confused about what happends when one variable depends on another.

Q6 On a related note, I saw a passage recently that spoke in terms like "x(u) is the inverse function of u(x)"--how should this be understood more precisely? I've come to discover that I don't understand expressions of this sort at all!

Q7 Does either of these defintions clearly capture 'practice' better than another?

Q8 How should similar attempts be made for the 'del' notation for patial derivatives?

Q9 Can someone give me an example of where d/dx and del/delx return different values on the same input? If I'm not mistaken, in some formalizations this never happens, and in other formalizations it does--I think harrison's would not allow for this since it just returns an 'expression' rather than one of the various functions that can be formed by an expression when you apply a lambda operator to it.

I started also trying to read this article on revising patial derivative notation:

[I've hit my link limit as I'm new here, but google "revised notation for partial derivatives" (with quotes). It's by WC Hassenpflug]

but I got stuck on the sentence:

"If we have a function u = f(x,y) and the transformation y=g(x∩) is made, then it is not clear whether del u / dx means del u / dx |y or del y / dx | n"

Can someone explain that one to me?

Q10 This all bears some superficial similarity to the relationship between so-called 'random variables', which are actually functions, and what are called 'variables' in the underlying logic--this has also confused me, and I see many operations done in text on random variables where the operators have only been defined for values in the random variable's domain, and not on functions. Can anyone comment on this? I would be nice if I could dismiss two long-standing confusions with one stone :p


That's a lot of questions! I agree with some of your points about notation, but unfortunately that's just how it's (ab)used in common practice. However, I also think that you're thinking too much about what everything means. I suspect most authors at this level do not worry about using good notation and distinguishing between free variables and bound variables and instantiated variables and so on.

Let's start by discussing Lagrange notation, as this is closest in form to pure-functional notation. If you have a function $f: U \to \mathbb{R}$, $U$ an open subset of $\mathbb{R}$, we say it is differentiable on $U$ if there is a function $f': U \to \mathbb{R}$ such that $$\forall x. \forall \epsilon. \exists \delta. \forall h. [ \epsilon > 0 \land \delta > 0 \land |h| < \delta \land x \in U \land x + h \in U \implies |f(x) - f(x + h) - h f'(x)| < \epsilon ]$$ Note that there's no dependence on how functions are defined and that it depends purely on the extensional properties of the function.

Now, the problem with understanding Leibniz's notation in this framework is that often one wants to write $\dfrac{df}{dx}$ for $f'(x)$, but then this leads to absurdities such as $\dfrac{df}{d2}$ for $f'(2)$. So what some authors do is write $\dfrac{df}{dx}(2)$ for $f'(2)$... which is also troublesome, as it seems to say that $f$ inherently depends on $x$. I think Leibniz notation is best understood in a framework where one has expressions depending on some fixed set of variables, rather than in a framework where one is dealing with functions and numbers directly. (I personally think Leibniz notation should be abolished. A friend of mine — very competent in calculus — once misinterpreted $f'(2x)$ as meaning the same thing as $\dfrac{d}{dx}[f(2x)]$ when in fact the latter expression is double the former.)

Similar criticisms can be made about the notation for partial derivatives, but here there is a problem because there are no good alternatives. Mathematica generalises Lagrange notation for this purpose: for example, $f^{(0, 1, 0)}$ means the first partial derivative of $f$ with respect to its second variable. In this notation, the Laplacian of a function $f$ of three variables may be written as $f^{(2, 0, 0)} + f^{(0, 2, 0)} + f^{(0, 0, 2)}$, where addition is pointwise. The problem with this notation is that it assumes that partial differentiation with respect to different variables commutes, but does not hold when the function isn't smooth enough. Another alternative is Einstein's index notation, in which $f_{i,jk}$ means the $i$-th component of the vector-valued function $f$ differentiated with respect to the $j$-th variable, and then again with respect to the $k$-th variable. In Penrose's abstract index notation this instead means the second total derivative of $f$. There's also a somewhat non-standard reading of $\dfrac{\partial}{\partial x}$ where the whole symbol is regarded as a named differential operator. This is usually in the context of differential geometry, and fits in with the identification of tangent vectors as differential operators.

Finally, regarding random variables: I think this is actually relatively easy to sort out compared to the Leibniz notation mess. In the Kolmogorov formalism, a random variable is a function on the underlying sample space. For simplicity of exposition I'll assume we work with real-valued random variables. We can define an algebra of real-valued random variables by pointwise operations: if I have two random variables $X, Y : \Omega \to \mathbb{R}$, there are random variables $X + Y, XY: \Omega \to \mathbb{R}$, where $(X + Y)(\omega) = X(\omega) + Y(\omega)$ and $(XY)(\omega) = X(\omega) Y(\omega)$, and for each $\lambda \in \mathbb{R}$ there is a random variable which takes the constant value $\lambda$ at all sample points. But better than that, we can apply arbitrary measurable functions $\mathbb{R} \to \mathbb{R}$ to random variables to obtain new ones: so if $f: \mathbb{R} \to \mathbb{R}$ is measurable and $X : \Omega \to \mathbb{R}$ is a random variable, then $f(X) : \Omega \to \mathbb{R}$ is just the composite $f \circ X$. There are other formalisms where the sample space is not only suppressed but entirely absent; Terence Tao has a blog post about this point of view.


I do not propose this as a proper answer, but it is too long for a comment.

The same problem was troubling me some time ago. IMHO Leibniz notation may be put on a formal ground, but context where it works is vividly different. Something like: Calculus is done in a “diagram”. A diagram is a set of variables and assignments

variable→set

(list variable)×variable→function

Functions go between variables. The diagram is commuting in the category theory sense. So, given a variable v, we can get a function to v from the diagram. This allows to “differentiate a variable”. (We also can define a diagram as a product preserving functor from a finite poset category with products to Set.) Every expression involving operations like sum and product creates a new variable v and a new function …→v . Likewise the differential operator. This allows to speak formally about partial derivatives also, prove the chain rule etc. However, I did not pursued my idea further and never came across something in the literature.