Gradient and Swiftest Ascent
Maybe the following helps to understand the intuition behind the object $\langle \nabla f,v\rangle$ occuring in the standard proof: $\nabla f(x)$ is the vector composed of the directional derivatives of $f$ in the directions of the $n$ standard basis vectors $e_1,\ldots e_n$. Now consider a unit vector $v$ in the 1-norm, i.e. $\sum |v_i|=1$. For simplicity let's think of the case $v_i\geq 0$.
Therefore $\langle \nabla f(x),v\rangle = \sum \frac{\partial f}{\partial x_i}(x) v_i$ is a convex combination of directional derivatives which is the directional derivative in the convex combination of the different directions. (Remember that derivatives are intuitivly linear approximations to the function) This is the equation $D_v f(x) = \langle \nabla f(x),v\rangle$. Thus: If we want to find the $v$ with maximal value of $D_v f(x)$ then we have to maximize $\langle \nabla f(x),v\rangle$.
Now the intuition behind $\langle u,v\rangle$ comes from thinking in terms of orthogonal projections: The scalar product equals the (signed) length of the projection of $u$ onto the line given by the direction $v$. This length can only be maximal if nothing is lost during the projection, i.e. if there is no orthogonal component. Therefore $u$ must be a multiple of $v$ and a positive multiple because we want a maximum.
Putting everything together: $D_vf(x)$ is maximal iff $v$ is the direction of $\nabla f(x)$.