Why is gradient in the direction of ascent but not descent?

I understand that differentiation of a function ($\mathbb{R} \rightarrow \mathbb{R} $) at a point is the rate of change in the output for a slight nudge in the input. And this rate of change could be negative or positive. There is no concept of direction for the single-variable function as obvious.

Now, my doubt is in the case of the multivariate function ($\mathbb{R}^n \rightarrow \mathbb{R}$) where differentiation is a gradient. And this gradient representing partial differentiation w.r.t. to each basis becomes a direction. This direction is a direction of ascent but not descent, why?. Why it is a direction is of ascent. My question is not at all related to steepest ascent, about which one can find many answers on this forum and read elaborately at this link. An intuitive explanation would be preferable than mathematical at this link.


Solution 1:

The comments persuaded me to reformulate my answer. For the original (still correct, but sub-optimal) version, see below.

The gradient is defined in a completely natural way. There is no completely mathematical reason, why it can be said to point to the steepest ascent. It has more to do with some more or less arbitrary choices being made in several definitions, which break this symmetry.

Observation. The concept "gradient points in direction of ascent" also works for single-valued functions $\Bbb R\to\Bbb R$. There is indeed a concept of direction in $\Bbb R$: right and left. A positive derivative is a vector (the gradient) pointing to the right (in the direction of ascent), a negative derivative is a vector pointing to the left (in this case, also the direction of ascent, because the function is decreasing to the right). So since the same observation also applies in 1D, we should start looking for an explanation here.

Note: I am going to use the terms "right" and "left" for the direction "positive" and "negative" on the number-line, because this is the standard orientation of the number-line. This is also a symmetry break, but only a notational one. It does not effect the mathematics in any way if we flip these directions.

Thanks to the comments, some few definitions could be localized as the rootcause of the broken symmetry. If you are standing on a mountain side, there is no meaning in asking whether it is up-hill or down-hill. This question only makes sense if you define a direction with respect to which we should judge the slope. The same goes for single-valued functions. It has been standard to call a function increasing if it's function graph is uphill to the right. This involves two arbitrary choices:

  1. The $y$-axis is pointing upwards, hence increasing function values are seen as going up. This is the most obvious arbitrary choice. Many applications do it the other way around, e.g. line-numbers in text are increasing from top to bottom, and pixels on a screen are usually adressed with an downwards-increasing $y$-axis.
  2. The kind of slope is judged w.r.t. to the "arbitrary" direction "right". Why not left? It seems natural, but is not forcing.

There might be another arbitrary choice: a positive derivative indicates that the function is increasing. We could have defined it the other way around. Anyways, flipping any single of these definitions will change the gradient from pointing upwards to pointing downwards.

Note. Yes I know, "increasing" is formally defined as $x\le y\implies f(x)\le f(y)$, but also this definition is motivated by the visualization of an increasing function graph to the right. No one would use it if the $y$-axis was pointing downwards.

Conclusion: The reason for the gradient pointing to the steepest ascent is based in our somewhat biased definitions. This is especially evident in the 1D-case. The derivative is defined in such a way so that it has a positive value (the gradient points to the right) if the function increases. A function is called increasing if its function graph is going uphill to the right. You see how these arbitrary definitions combine to "gradient is pointing uphill".


ORIGINAL

Because we have a somehow biased definition of differentiation. Let me explain.

As noted in a comment, this "gradient pointing in direction of ascent" also works for single-valued functions $\Bbb R\to\Bbb R$. There is indeed a concept of direction in $\Bbb R$: left and right. A positive derivative is a vector (the gradient) pointing to the right (in the direction of ascent), a negative derivative is a vector pointing to the left (also in the direction of ascent, because the function is decreasing). So since the same observation happens in $1$D, we should probably start there looking for an answer.

It all happens because the definition of derivative is biased in some sense. Someone once decided that a function is considered increasing if its value gets bigger to the right. You see the broken symmetry? Why to the right, why not to the left? So once one decided that the derivative is positive $-$ the gradient points to the right $-$ when the function grows to the right. Here we have it. Whoever defined it, directly coupled the terms "direction of gradient" and "direction of acsent".

Would he had decided to define a function as increasing if its value grows to the left (unnatural, considering our left-to-right reading direction), then the gradient would point to the steepest decent.

Note: In this answer I assumed that the number line is oriented with the positive number on the right. This is standard, but another symmetry break (but only a notational one, without impact on the mathematics). You can substitute all left/right above by negative/positive if you want to be indepedent of this broken symmetry.

Solution 2:

"There is no concept of direction for the single-variable function as obvious."

False.

When the univariate derivative is positive, the function increases in the direction of increasing input; when negative, in the direction of decreasing input. There are only two possible directions, but whichever it is, the derivative does point in the direction in which the function increases.

Solution 3:

The gradient with regard to some input variable indicates how much the value of the output variable goes up (i.e. ascends) when that input variable goes up. As such, if you move in the direction of the gradient, and the gradient is positive, then the value of the output variable will go up. If the gradient is negative, however, the increasing the input variable will decrease the output variable. But yes, a positive gradient means that you will ascend if you 'follow' the gradient.

This is our definition of gradients or slopes, and works just the same in $\mathbb{R}$ as in $\mathbb{R}^n$. That is, when we take the derivative, the derivative $\frac{dy}{dx}$ indicates to what extent $y$ increases as $x$ increases which, by the way, is the same as the extent to which $y$ decreases as $x$ decreases.

Now, some answers suggest that when we defined the gradient in this manner there was a bit of 'arbitrariness' involved, and that we could have defined the gradient differently so that the direction would reverse. It is even suggested that this has something to do with 'right' being arbitrarily chosen as the 'up' direction and 'left' being 'down'.

However, I strongly disagree with those answers, because the alternative would have been to say that the gradient would be the extent to which what extent $y$ increases as $x$ decreases .. which would be a very confusing and unnatural thing to do; it's like bringing in an extraneous negative or reversal.

Anyway, when you want the output value to descend, you should go in the 'other' direction of the gradient, i.e. subtract a value proportional to that gradient. Of course, that does mean that if the gradient is negative, you end up increasing the input variable in order to decrease the output variable.

Solution 4:

A more general definition of "derivative" than what you are used to would be "linear function $L(a) = f'(x)$ that best approximates $f(a+x)$ with small $a$".

For a function $\mathbb{R}\rightarrow\mathbb{R}$ this linear function is also from $\mathbb{R}\rightarrow\mathbb{R}$.

We can identify the linear functions $\mathbb{R}\rightarrow\mathbb{R}$ with a point in $\mathbb{R}$ by noting $L(a) = ka$ for some constant $k$.

We could, if we chose, define $L(a) = -ka$, or $L(a) = \pi k a$ or $L(a) = ae^k$ just as easily and get a different $k$. The easy one we pick ends up making some of the math easier, plus we get addition of derivatives corresponding to addition of the scalar we associate with them and even multiplication working out.

When we take this definition and extend it to $\mathbb{R}^n \rightarrow \mathbb{R}$ or $\mathbb{R} \rightarrow \mathbb{R}^m$ or even $\mathbb{R}^n \rightarrow \mathbb{R}^m$ we still get derivatives being linaer functions, but now between different spaces.

These linear functions can be mapped to matrices, and in some cases vectors. If this mapping is linear, this mapping is dependent on your choice of basis.

Working with vectors and matrices is often easier than working with an abstract derivative. So we pick a basis that makes the rest of the math work better.

Given $f: \mathbb{R}^n \rightarrow \mathbb{R}$, we get $L(a) = f'(x)$ as the best linear approximation to $f(a+x)$, aka

$$f(a+x) \approx L(a) + f(x)$$

If $L$ is represented as a vector $G$ such that $L(a) = G \cdot a$, then along the line $L(a) \leq |G||a|$ and $L(a) = \lambda |G|^2$ if $a=\lambda G$.

In short, the derivative is maximal along the line parallel to the direction pointed by $G$, the gradient vector, which is our chosen representiative for the linear function that is the derivative of $f$ at $x$.