Why isn't the directional derivative generally scaled down to the unit vector?

I'm starting to learn how to intuitively interpret the directional derivative, and I can't understand why you wouldn't scale down your direction vector $\vec{v}$ to be a unit vector.

Currently, my intuition is the idea of slicing the 3D graph of the function along its direction vector and then computing the slope of the curve created by the intersection of the plane.

But I can't really understand how the directional derivative would be a directional derivative if it were not scaled down to be a change in unit length in the direction of $\vec{v}$. Is there an intuitive understanding I can grasp onto? I'm just starting out so maybe I haven't gotten there yet.

Note, I think there may be a nice analogy to linearization, like if you take "twice as big of a step" in the direction of $\vec{v}$ , then the change to the function due to the change in this step is twice as big. Is this an okay way to think about it?


The intuition I think of for a directional derivative in the direction on $\overrightarrow{v}$ is that it is how fast the function changes if the input changes with a velocity of $\overrightarrow{v}$. So if you move the input across the domain twice as fast, the function changes twice as fast.

More precisely, this corresponds to the following process that relates calculus in multiple variables to calculus in a single variable. In particular, we can define a line based at a point $\overrightarrow{p}$ with velocity $\overrightarrow{v}$ parametrically as a curve: $$\gamma(t)=\overrightarrow{p}+t\overrightarrow{v}.$$ This is a map from $\mathbb R$ to $\mathbb R^n$. However, if $f:\mathbb R^n\rightarrow \mathbb R$ is another map, we can define the composite $$(f\circ \gamma)(t)=f(\gamma(t))$$ and observe that this is a map $\mathbb R\rightarrow\mathbb R$ so we can study its derivative! In particular, we define the directional derivative of $f$ at $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$ to be the derivative of $f\circ\gamma$ at $0$.

However, when we do this, we only see a "slice" of the domain of $f$ - in particular, we only see the line passing through $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$. This corresponds to the notion of slicing you bring up in your question. In particular, we do not see any values of $f$ outside of the image of $\gamma$, so we are only studying $f$ on some restricted set.


Unit vectors are vastly overrated — the notion of vector is far more computationally convenient when treated as a whole rather than decomposed into separate notions of direction and magnitude.

I claim it leads to better understanding as well.

Thus, one should not introduce unit vectors by habit — such a manipulation should be reserved for those circumstances when it does something useful.

Similarly, a good definition or computational tool shouldn't force unit vectors upon the user, unless there is a very good reason for doing so.


Algebraically, the directional derivative is not the main idea — the main idea is the differential of a function: in usual terms, $\nabla f$ is the row vector given by

$$ \nabla f(\vec{x}) = \begin{pmatrix} f_1(\vec{x}) & f_2(\vec{x}) & f_3(\vec{x}) \end{pmatrix} $$

where by $f_k$, I mean the derivative of the function $f$ in its $k$-th place. The directional derivative is simply the product of the differential with the given (column) vector:

$$ \nabla_\vec{v} f = (\nabla f) \vec{v} $$

As such, restricting to unit vectors is unnatural thing to do. Rescaling the input vector to be a unit vector is extremely unnatural.

Note that some people use $\nabla f$ to refer to a column vector, or even treat row and column vectors as the same thing. This is unfortunate, because it is computationally awkward when you change variables, and gets in the way of understanding the difference between vectors and covectors, and the close relationship between the inner product and the transpose operation.


Finally, it's worth noting that derivatives — even directional derivatives — make sense in contexts where there is no notion of length, and thus there is no notion of a "unit" vector that can be applied.


Let $f : \mathbb{R}^n \to \mathbb{R}^m$ and (if the limit exists) $$D_v f(x) = \lim_{h \to 0} \frac{f(x+hv)-f(x)}{h}$$ be the directional derivative in the direction $v$. This way, if the function is differentiabble $$ D_{au+bv} f(x) = a\, D_{u} f(x)+b\, D_{v} f(x) \qquad (a,b) \in \mathbb{R}^2$$ ie. the directional derivative is linear in the direction. Indeed $$D_{v} f(x) = J_x v$$ where $J_x$ is the Jacobian matrix.

You'll have some problems for saying and understanding that if you restrict to $\|v\|=1$, or worse if you normalize $D_vf(x)$