Geometric intuition for directional derivatives
Let’s back up a bit. As Hans Ludmark points out in his comment above, the basic definition of the directional derivative in the direction specified by the unit vector $\mathbf u=(u_1,u_2)$ at a point $P=(a,b)$ is via a limit similar to the one from elementary calculus: $${\partial f\over\partial\mathbf u}(a,b)=\lim_{h\to0}{f(a+hu_1,b+hu_2)-f(a,b)\over h}.$$ As you’ve observed, this amounts to taking a vertical slice through the surface and then computing the ordinary derivative of that slice, as illustrated below.
This derivative is, of course, the slope of the tangent line (blue) to the slice at that point. Observe that this line is also the intersection of the tangent plane at that point (grayish blue) with the cutting plane (violet), so we can interpret the directional derivative as the steepness of the tangent plane in a given direction. As you rotate the cutting plane around $P$, the slope of this line changes, reaching a maximum when the two planes are perpendicular, as we’ll see below. (You can also see that this is the case by visualizing cutting a cylinder parallel to the $z$-axis by a plane and imagining what happens to the high point as you move that plane around.)
Let’s say that the tangent plane is given by the equation $\lambda x+\mu y-z=d$ with normal $\mathbf n_t=(\lambda,\mu,-1)$. A normal to the cutting plane is $\mathbf n_c=(-u_2,u_1,0)$, which is just $\mathbf u$ rotated ninety degrees. In $\mathbb R^3$ we can find the direction of the line of intersection via a cross product: $$\mathbf n_t\times\mathbf n_c=(u_1,u_2,\lambda u_1+\mu u_2)$$ and the slope of this line is thus $${\lambda u_1+\mu u_2\over\sqrt{u_1^2+u_2^2}}=\lambda u_1+\mu u_2=(\lambda,\mu)\cdot\mathbf u=\|(\lambda,\mu)\|\cos\phi,$$ where $\phi$ is the angle between the projection of $\mathbf n_t$ onto the $x$-$y$ plane and $\mathbf u$. The slope is therefore maximal when $\phi=0$, i.e., when $\mathbf u$ and the projection of $\mathbf n_t$ point in the same direction, but this happens when the two planes are perpendicular. The maximum value of this slope is $\|(\lambda,\mu)\|$.
This is where the gradient of $f$ comes in. If we write the equation of the surface as $F(x,y,z)=f(x,y)-z=0$, then $\nabla F=(f_x,f_y,-1)$ is normal to the surface, so an equation of the tangent plane at $(a,b,f(a,b))$ is $$xf_x(a,b)+yf_y(a,b)-z=af_x(a,b)+bf_y(a,b)-f(a,b).$$ This is exactly in the form analyzed above, with $\lambda=f_x(a,b)$ and $\mu=f_y(a,b)$, so $${\partial f\over\partial\mathbf u}(a,b)=\nabla f(a,b)\cdot\mathbf u$$ with the maximal rate of change given by $\|\nabla f(a,b)\|$.
This seems awfully coincidental, but it’s not. Going back to the plane equation $\lambda x+\mu y-z=d$ above, the coefficients $\lambda$ and $\mu$ are respectively the “$x$-slope” and “$y$-slope,” i.e., the slopes of the intersections with planes parallel to the $x$- and $y$-axes. These slopes are encoded in the normal $(\lambda,\mu,-1)$. For the tangent plane, these slopes are the directional derivatives in the directions of the coordinate axes, also known as the partial derivatives of $f$.