Geometric interpretation of mixed partial derivatives?

I'm looking for a geometric interpretation of this theorem:

enter image description here

My book doesn't give any kind of explanation of it. Again, I'm not looking for a proof - I'm looking for a geometric interpretation.

Thanks.


Inspired by Ted Shifrin's comment, here's an attempt at an intuitive viewpoint. I'm not sure how much this counts as a "geometric interpretation".

Consider a tiny square $ABCD$ of side length $h$, with $AB$ along the $x$-axis and $AD$ along the $y$-axis.

D---C
|   | h
A---B
  h

Then $f_x(A)$ is approximately $\frac1h\big(f(B)-f(A)\big)$, and $f_x(D)$ is approximately $\frac1h\big(f(C)-f(D)\big)$. So, assuming by $f_{xy}$ we mean $\frac\partial{\partial y}\frac\partial{\partial x}f$, we have $$f_{xy}\approx\frac1h\big(f_x(D)-f_x(A)\big)\approx\frac1{h^2}\Big(\big(f(C)-f(D)\big)-\big(f(B)-f(A)\big)\Big).$$ Similarly, $$f_{yx}\approx\frac1{h^2}\Big(\big(f(C)-f(B)\big)-\big(f(D)-f(A)\big)\Big).$$ But those two things are the same: they both correspond to the "stencil" $$\frac1{h^2}\begin{bmatrix} -1 & +1\\ +1 & -1 \end{bmatrix}.$$


One thing to think about is that the derivative in one dimension describes tangent lines. That is, $f'$ is the function such that the following line, parametrized in $x$, is tangent to $f$ at $x_0$: $$y(x)=f(x_0)+f'(x_0)(x-x_0).$$ We could, in fact, go further, and define the derivative to be the linear function which most closely approximates $f$ near $x_0$. If we want to be really formal about it, we could say a function $y_0$ is a better approximation to $f$ near $x$ than $y_1$ if there exists some open interval around $x$ such that $|y_0(a)-f(a)|\leq |y_1(a)-f(a)|$ for every $a$ in the interval. The limit definition of the derivative ensures that the closest function under this definition is the $y(x)$ given above and it can be shown that my definition is equivalent to the limit definition.

I only go through that formality so that we could define the second derivative to be the value such that $$y(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}2f''(x_0)(x-x_0)^2$$ is the parabola which best approximates $f$ near $x_0$. This can be checked rigorously, if one desires.

However, though the idea of the first derivative has an obvious extension to higher dimensions - i.e. what plane best approximates $f$ near $x_0$, it is not as obvious what a second derivative is supposed to represent. Clearly, it should somehow represent a quadratic function, except in two dimensions. The most sensible way I can think to define a quadratic function in higher dimension is to say that $z(x,y)$ is "quadratic" only when, for any $\alpha$, $\beta$, $x_0$ and $y_0$, the function of one variable $$t\mapsto z(\alpha t+x_0,\beta t+y_0)$$ is quadratic; that is, if we traverse $z$ across any line, it looks like a quadratic. The nice thing about this approach is that it can be done in a coordinate-free way. Essentially, we are talking about the best paraboloid or hyperbolic paraboloid approximation to $f$ as being the second-derivative. It is simple enough to show that any such function must be a sum of coefficients of $1$, $x$, $y$, $x^2$, $y^2$, and importantly, $xy$. We need the coefficients of $xy$ in order to ensure that functions like $$z(x,y)=(x+y)^2=x^2+y^2+2xy$$ can be represented, as such functions should clearly be included our new definition quadratic, but can't be written just as a sum of $x^2$ and $y^2$ and lower order terms.

However, we don't typically define the derivative to be a function, and here we have done just that. This isn't a problem in one dimension, because there's only the coefficient of $x^2$ to worry about, but in two-dimensions, we have coefficients of three things - $x^2$, $xy$, and $y^2$. Happily, though, we have values $f_{xx}$, $f_{xy}$ and $f_{yy}$ to deal with the fact that these three terms exists. So, we can define a whole ton of derivatives when we say that the best approximating quadratic function must be the map $$z(x,y)=f(x,y)+f_x(x,y)(x-x_0)+f_y(x,y)(y-y_0)+\frac{1}2f_{xx}(x,y)(x-x_0)^2+f_{xy}(x,y)(x-x_0)(y-y_0)+\frac{1}2f_{yy}(x,y)(y-y_0)^2$$ There are two things to note here:

Firstly, that this is a well-defined notion regardless of whether we name the coefficients or arguments. The set of quadratic functions of two variables is well defined, regardless of how it can be written in a given form. Intuitively, this means that, given just the graph of the function, we can draw the surface based on local geometric properties of the graph of $f$. The existence of the surface is implied by the requirement that $f_{xy}$ and $f_{yx}$ be continuous.

Secondly, that there are multiple ways to express the same function; we would expect that it does not matter if we use the term $f_{xy}(x,y)(x-x_0)(y-y_0)$ or $f_{yx}(x,y)(y-y_0)(x-x_0)$ because they should both describe the same feature of the the surface - abstractly, they both give what the coefficient of $(x-x_0)(y-y_0)$ is for the given surface, and since the surface is well-defined without reference to derivatives, there is a definitive answer to what the coefficient of $(x-x_0)(y-y_0)$ is - and if both $f_{xy}$ and $f_{yx}$ are to answer it, they'd better be equal. (In particular, notice that $z(x,y)=xy$ is a hyperbolic paraboloid, which is zero on the $x$ and $y$ axes; the coefficient of $(x-x_0)(y-y_0)$ can be thought of, roughly as a measure of how much the function "twists" about those axes, representing a change that affects neither axis, but does affect other points)