Why must the gradient vector always be directed in an increasing direction?

Intuitively, $f(x + \Delta x) \approx f(x) + \langle \nabla f(x), \Delta x \rangle$. (I'm using the convention that $\nabla f(x)$ is a column vector.) So if $\Delta x = \epsilon \nabla f(x)$ (here $\epsilon > 0$ is tiny), then \begin{align*} f(x + \Delta x) & \approx f(x) + \epsilon \langle \nabla f(x), \nabla f(x) \rangle \\ &= f(x) + \epsilon \| \nabla f(x) \|^2 \\ &\geq f(x). \end{align*}

So when we move a bit in the direction of $\nabla f(x)$, the value of $f$ increases.

It's not obvious:

Consider the function $$f(x,y):=\cases{0&$\bigl((x,y)=(0,0)\bigr)$,\cr x+y-{4|xy|^{4/3}\over x^2+y^2}&(else) .\cr}$$ Then $f$ is continuous at $(0,0)$ and $\nabla f(0,0)=(1,1)$, but $$f(t,t)-f(0,0)=2t-{4|t|^{8/3}\over 2t^2}=2|t|^{2/3}\bigl(|t|^{1/3}{\rm sgn(t)}-1\bigr)<0\qquad(0<|t|<1)\ .$$ This shows that $f$ is actually decreasing in the direction of the gradient.

Now the considered $f$ is not differentiable at $(0,0)$, and the gradient defined via partial derivatives exists only by coincidence. For any $f$ which is actually differentiable at $(0,0)$ one has $$f(x,y)-f(0,0)=\nabla f(0,0)\cdot (x,y)+o(r)\qquad(r:=\sqrt{x^2+y^2}\to0)\ .$$ Now, if $\nabla f(0,0)=(a,b)\ne(0,0)$ and you choose $(x,y):=(ta,tb)$ with $t>0$ then $$f(ta,tb)-f(0,0)=t(a^2+b^2)+o(t)=t (a^2+b^2)(1+o(1))\qquad(t\to0)\ ;$$ and therefore $f(ta,tb)-f(0,0)$ is $>0$ for sufficiently small $t>0$.

I prefer to explain that is slightly different way:

Actually we define gradient to be always pointing to to the maximum increasing direction! take look at the following:

Consider a function $f(x,y)$, then it's full derivative is:

$df(x,y)=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\left(dx,dy\right)=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\vec{dr}=\left\Vert \left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\right\Vert \left\Vert \vec{dr}\right\Vert \cos\alpha$

so if we consider for simplicity that $\left\Vert \vec{dr}\right\Vert =1$ finaly we get that:

$df(x,y)=\left\Vert \left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\right\Vert \cos\alpha$

So because cosine function is always less or equal to one , we see that the first term is the maximum possible value for our function increase (because that correspond to $\alpha=0$ ) thus if we define this first term as the length of some vector and we name it gradient, then this vector will point out to the direction of maximum possible increase of our function $f(x,y)$.

Intuitively:

if the function is decreasing in one variable, then the partial derivative is negative, so the component vector of the gradient for that variable points in the negative direction - which means increasing function value.
if the function is increasing in one variable, then the partial derivative is positive, so the component vector of the gradient for that variable points in the positive direction - which means increasing function value.

=> Doesn't matter how the function profile is, the gradient, by definition, points in the increasing direction.

Why is there this strange contradiction between the language of logic and that of set theory?

Do there exist numbers normal in every base except for one?

Triangles defined on an infinite Go board by same-colored stones

What is the smallest cardinality of a Kuratowski 14-set?

How to prove that $\frac{(mn)!}{m!(n!)^m}$ is an integer?

What's the point of modal logic?

Looking for Open Source Math Software with Poor Documentation

Is it possible to write a number in a base of less than 1?

How prove this integral limit $=f(\frac{1}{2})$

Is an anti-symmetric and asymmetric relation the same? Are irreflexive and anti reflexive the same?

Is there a way to calculate the area of this intersection of four disks without using an integral?