Gradient of a function as the direction of steepest ascent/descent

Solution 1:

The question is how you would measure the steepness of ascent. For one-dimensional functions, steepness is defined in terms of the derivative:

$$g^\prime(x) \equiv \lim_{h \rightarrow 0}\frac{f(x+h)-f(x)}{h}$$

By this limit definition, steepness is measured by computing the slope between the points $\langle x, f(x)\rangle$ and $\langle x + h, f(x+h)\rangle$, and letting that distance $h$ get smaller and smaller.


Now the question is how we extend this idea of steepness to functions of more than one variable.

Trick #1: Directional steepness requires only ordinary derivatives

Suppose we have a two-variable function $f(x,y)$. (Conceptually, the graph of $f$ is a surface hovering above the $xy$ plane.) Because we are presumably just learning multivariable calculus, we don't have a mathematical definition for the "steepness" at a point $\langle x,y\rangle$. However, there is a trick:

Suppose you pick a point $\langle x_0, y_0\rangle$. And you also pick a direction, in the form of a line like $2y = 3x$. You can see how the height of the function $f$ varies as you start at the point $\langle x_0, y_0 \rangle$ and take small steps in the direction of the line. You can compute this directional steepness using only the ordinary (one-dimensional) derivative.

In fact the equation is something like this:

$$D_{2y=3x} f = \lim_{h\rightarrow 0}\frac{f(x_0 + 2h, y_0 + 3h) - f(x_0, y_0)}{h}$$

(Advanced side note: this definition really is just a one-dimensional derivative. If I parameterize the line $2y=3x$ using a function like $u(t) = \langle 2t, 3t\rangle$, I can define the directional derivative as just $$D_u f \equiv D(f\circ u)(0).$$ To put it in more standard notation, $D_u f \equiv [\frac{d}{dt}f(u(t)) ]_{t=0}$ )

Trick #2: The gradient is a list of the steepness in each axis direction

In the previous section, we defined how to compute the direction steepness of a function — that is, the steepness in the direction of a line.

The lines along the coordinate axes are especially important. If we have a multivariable function $f(x_1, x_2, x_3, \ldots, x_n)$, let $\ell_1, \ell_2, \ldots \ell_n$ be lines, where $\ell_i$ is the line lying along the $x_i$ axis.

We'll define the gradient to be the list of directional steepnesses in each of the coordinate directions:

$$\nabla f = \langle D_{\ell_1}f, D_{\ell_2}f, \ldots, D_{\ell_n}f\rangle.$$

Let's think carefully about this structure. The function $f$ takes in a list of numbers $x_1,\ldots, x_n$ and produces a single number. The function $\nabla f$ takes in a list of $n$ numbers and produces a list of $n$ steepnesses (which are also numbers.)

Visually, you can imagine that $\nabla f$ takes in a point $\langle x_1, \ldots, x_n\rangle$ and produces a steepness vector at that point. The components of that vector are made up of the directional steepnesses of the function $f$ in the direction of the coordinate axes.

Trick #3: Dot products measure directional overlap

When $\vec{u}$ and $\vec{v}$ are vectors, then the dot product between $\vec{u}$ and $\vec{v}$ can be defined by

$$\vec{u}\cdot \vec{v} = ||\vec{u}|| \cdot ||\vec{v} || \cdot \cos{\theta},$$

where $\theta$ is the angle between the two vectors.

Now suppose $\vec{v}$ is kept constant. If we keep the length of $\vec{u}$ constant but allow it to revolve in a circle, for example, we can change the angle $\theta$ and see how it affects the dot product.

Evidently, the dot product is maximized when the two vectors are pointing in the same direction, because then $\cos{\theta}=\cos{0} = 1$ is maximal.

Trick #4: You can compute directional steepness using the dot product

Recall that $D_u f$ is the steepness of $f$ in the direction of some line $u$. Recall that $\nabla f$ is the gradient of $f$— a list of the directional steepnesses in each of the coordinate directions.

It turns out that the following fact is true:

If $u(t) = \langle at, bt\rangle$ is the parametrization of a line, and if $u(t)$ has length 1 when $t=1$, then $$D_u(f) = \nabla f \cdot u(1) $$ In other words, we can compute the directional steepness as the dot product of the gradient and the line of the direction.

Conclusion: The graident is the direction of steepest ascent Because we can compute directional steepness as a dot product with the gradient, the answer to the question: "In which direction is this function steepest?" is the same as the answer to the question "Which line will have the greatest dot product with the gradient?", which we know is "The line which is parallel to the gradient!".

Solution 2:

Let’s try coming at it from a different direction, so to speak.

Consider the plane in $\mathbb R^3$ given by $ax+by=z$. The vector $\mathbf n=\langle a,b,-1\rangle$ is normal to this plane. A bit of thought should convince you that the projection of $\mathbf n$ onto the $xy$ plane, $\langle a,b\rangle$, points in the direction in which this plane is steepest. It’s fairly straightforward to prove this analytically, but you can also see this by visualizing cutting a cylinder centered on the $z$-axis with this plane and imagining what happens to the high point of the cut as you tilt the plane in various directions. Displacing the plane from the origin doesn’t change its inclination, so $\langle a,b\rangle$ also gives the steepest direction for any other plane with the same normal, i.e., for $ax+by-z=c$.

Moving now to a curved surface, by analogy with functions of one dimension, we define instantaneous rates of change in terms of tangents to the surface. We’re assuming that the function which defines our surface is suitably well-behaved, so all of these tangents lie in a well-defined tangent plane to the surface. Looking at it another way, this tangent plane captures the rates of change of the function in all directions. As above, then, a “downward” normal to this plane will give us the direction of fastest increase. All we need to do now is find such a normal vector.

Let a surface in $\mathbb R^3$ be given by $F(x,y,z)=c$. Consider a curve $\gamma: t\mapsto(x(t),y(t),z(t))$ on this surface that passes through the point $P_0 = \gamma(0)$, so that we have $(F\circ\gamma)(t)=c$. (Again, we’re assuming that these functions are suitably well-behaved so that this parametrization exists.) Differentiating both sides with respect to $t$ and applying the chain rule gives $$F_x(P_0)x'(0)+F_y(P_0)y'(0)+F_z(P_0)z'(0)=\nabla F(P_0)\cdot\gamma'(0)=0.$$ Now, $\gamma'(0)$ is tangent to $\gamma$ at $P_0$ and thus lies in the tangent plane. Since $\gamma$ was arbitrary, we can conclude that $\nabla F$ is orthogonal to every tangent vector to the surface at $P_0$, i.e., that it is normal to the tangent plane.

For a surface given by $z=f(x,y)$ this normal vector is $\langle f_x,f_y,-1\rangle$, and its projection $\nabla f$ thus points in the direction of steepest ascent along the surface, i.e., the direction in which $f$ increases fastest.

Afterthought: Going back to the original plane example at the top, we can see why this result is plausible. A plane in $\mathbb R^3$ is completely specified by its $x$-slope/rate-of-change $a$, its $y$-slope $b$ and a point on the plane. For the tangent plane to the surface $z=f(x,y)$, these rates of change in the directions of the coordinate axes are given by the partial derivatives of $f$, which are encoded in its gradient.

Solution 3:

I had first learned it as if $f(x,y,z) = k$ is a surface $\nabla f$ is a vector perpendicular to the surface.

i.e. the plane tangent to the surface at $\mathbf x = (x_1,y_1,z_1)$ is$\frac {\partial f}{\partial x}(\mathbf x) (x-x_1) + \frac {\partial f}{\partial y}(\mathbf x) (y-y_1) + \frac {\partial f}{\partial z}(\mathbf x)(z - z_1) = 0$

And $(\frac{\partial f}{\partial x}(\mathbf x), \frac{\partial f}{\partial y}(\mathbf x),\frac {\partial f}{\partial z}(\mathbf x))$ is normal to the plane.

$\nabla f$ is a vector perpendicular to the surface when $k$ is fixed. Now we allow $k$ some freedom, and we want to move in the direction of greatest change. Whatever direction we go has a component perpendicular to the surface, and a component parallel to the surface. If we move parallel to the surface we are not contributing to a change in $k.$ The direction of maximal change is $100%$ perpendicular to the surface.

If that intuition isn't working for you. The we are back to the answer you found less than satisfying.

$\frac {\partial f}{\partial x}$ is the change in $f$ for a change in $x.$

For any unit vector $u,$ $\nabla f \cdot u$ would be the change in $f$ for a change in direction $u.$

And we want find $u$ that maximizes $\nabla f \cdot u = \|\nabla f\| cos\theta$

Which will be maximal when $\theta = 0$, or when $u$ points in the same direction as $\nabla f$

Does $\nabla f$ tell us the direction of steepest decent, too? It certainly does. Straight in the opposite direction.

$\nabla f$ does not necessarily point directly toward the local maxima or minima. It points in the direction of greatest change. If you imagine yourself climbing a hill. Straight up the hill is not necessarily the direction the peak of the mountain. You may get up the steep part and then be making a turn.