Did the directional derivative get developed before the gradient?

In learning multivariable calculus, I've often seen the gradient introduced before the directional derivative. To me this is backwards. Once we have partials, we treat then as rates of change in the direction of the axes. We should then naturally ask "what if the direction were not aligned with the axes?". From there, a natural question of "in which direction is the rate of change maximal?" Seems to arise. In the development of calculus, which came first?

I find that by introducing the gradient before directional derivatives it seems like some arbitrary vector that was pulled out of no where. It only seems to make sense once we understand directional derivatives.

Solution 1:

The following is a partial answer based on my own reading in Kline's Mathematical Thought from Ancient to Modern Times and the Jeff Miller's website on the earliest uses of the symbols and words involved.

The development of vectors and their associated calculus has a rather fractious history in the nineteenth century. It seems best to leave the divergence theorem and its friends out of this, since that history is complicated enough as it is.

Vectors and vectorial thinking are a very late development in the century. To have an idea of how people thought before, a reasonable place to look is that throwback, Green's theorem, which is stated as $$ \int P \, dx + Q \, dy = \iint \left( \frac{\partial Q}{\partial x}-\frac{\partial P}{\partial y} \right) \, dx \, dy. $$ Here $P$ and $Q$ are just functions of $x$ and $y$, with no relationship between them. This formula is quite annoying, since it's pretty hard to remember where the minus sign goes. But this is how people worked: you can look in the paper on the divergence theorem I linked above to see this in more detail.

How about derivatives, then? Lagrange is credited with the form of Taylor's theorem in two variables. Are there directional derivatives in this? Not really: it's a function of two variables, not a vector.

The first "vectory" thing that is produced is the quaternions, by Hamilton in the 1840s (that they were written down by Gauss and not published years earlier is both typical and not really relevant, since Gauss's unpublished work rarely leaks out into common mathematical parlance in his lifetime, at least). This is the beginning of a shift towards what becomes abstract algebra: this, Grassmann's work, and shortly afterward Cayley and others, push away from things that have to be grounded in reality to things that rely on the rules of symbol-pushing. Abstract vector spaces come late enough that they are not worth talking about here. Concrete vectors in the Gibbs–Heaviside model also come along much later, initially as an offshoot of quaternions that takes the physically useful parts and strips off the rest. There's an essay that describes the debate between the emerging vectorialists and quaternionists here.

The quaternions have two bizarre properties: firstly, that they are not commutative, and secondly that they need four real numbers to describe them, which was rather puzzling when previously everything lived in three dimensions. But Hamilton also writes down what is undeniably a vector operator, $$ \nabla = \mathbf{i} \frac{\partial}{\partial x} + \mathbf{j} \frac{\partial}{\partial y} + \mathbf{k} \frac{\partial}{\partial z} $$ (actually, Hamilton's $\nabla$ is rotated to point left, but it's too fiddly to do that on here). Then $\nabla \mathbf{q}$, where $\mathbf{q}$ is a quaternion, gives as the "scalar part" a quantity that happens to be the negative of the divergence (Maxwell will call it the convergence) and as the "vector part" the curl (as seen on this page of Hamilton's Lectures on Quaternions), so this $\nabla$ is in a sense the only operator you need. On the next page he discusses the action of $\nabla$ on a scalar function (a temperature, or gravitational potential) as describing the vector (of heat flow, or gravitational force acting): in the case of a generic function, a normal vector. So this is certainly a reasonable candidate for the first discussion of a gradient vector (it seems no one else even had the means to write this operator as an object in its own right before Hamilton), although the term gradient is first used much later.

Now, the directional derivative. Hamilton does not connect the directional derivative to $\nabla$ (it should be obvious why: it's missing the scalar part of the quaternion), instead describing everything for Taylor series in terms of differentials (see this page and following). The anticommutativity prevents the production of a gradient vector, since the $dq$s appear interleaved in the expressions. This is one disadvantage of the quaternionic approach. If you want to call this a directional derivative, that's the end of it. The actual phrase appears in A Short Table of Integrals by B. O. Pierce, accompanied by a modern geometric interpretation, but it is clear that the concept is rather older in one form or another. But I think the idea of a directional derivative using an actual vector therefore has to be younger than $\nabla$, since Hamilton is the first to publish anything treating a multi-dimensional object as one thing, rather than a pile of components. It's a matter of opinion rather than absolute fact, though.

Solution 2:

It looks like that when think of the gradient as an "arrow" pointing towards a direction (the one of the "steepest ascent"). This can certainly be done, but I think that the gradient was born just as the "list" of the partial derivatives (I have no references for this). Thought like this, it is even more basic that a generic directional derivative.

$\pi$ when not in base 10

Why does the Dedekind zeta function of a number field have a pole at $s=1$?

Gambling game: What is the probability of eventually going broke

Finding a set of continuous functions with a certain property [duplicate]

How strong is the axiom of well-ordered choice?

What is the last non-zero digit of $(\dots((2018\underset{! \text{ occurs }1009\text{ times}}{\underbrace{!)!)!\dots)!}}$?

Show that $\sum_{i=1}^n \frac{1}{|x-p_i|}\le 8n\sum_{i=1}^n \frac{1}{2i-1}$ for some $x,0\le x \le 1$

Why do even numbers which surround primes have more divisors than those which surround composites?

Motivation/Intuition for the Pentagon Axiom

Understanding etale cohomology versus ordinary sheaves

Feelings of inadequacy as a PhD mathematician [closed]

If a power of 2 divides a number, under what conditions does it divide a binomial coefficient involving the number that it divides?