Intuition for why $f_{xy} = f_{yx}$
If we have a function $f(x,y)$, why is it that $f_{xy} = f_{yx}$? I'm looking for an intuitive, qualitative reason rather than a rigorous proof.
$f_{yx}$ represents the rate of change of the gradient parallel to the $x$ axis, as you move along the $y$ axis. Similarly, $f_{xy}$ represents the rate of change of the gradient parallel to the $y$ axis, as you move along the $x$ axis. At least, this is how I understand it. However, I can't see any reason why the two should be the same.
Solution 1:
Intuition can't tell you why they're equal. For that it's too vague. But we can see that they measure the same thing.
Let's look at the origin specifically, just to make it easier. Also, let's say the function value and first derivatives at the origin are all $0$.
First we see what $f_{xy}$ (derivative first with respect to $x$, then with respect to $y$) measures. For each plane normal to the $y$-axis there is a line lying entirely in that plane which is tangent to the function graph for $x=0$. As we move along the $y$-axis, $f_x$ measures the slope of this line, and $f_{xy}$ measures the rate of rotation of this line. At the origin, our assumptions say that this line is the $x$-axis.
If you think enough about this, you will realize that an archetypal function with positive $f_{xy}(0,0)$ (something like $f(x,y)=xy$, specifically something with $f=f_x=f_y=0$ at the origin) will, close to the origin, be positive in the first and third quadrants and negative in the second and fourth.
Now notice that this will, in the same interpretation, be exactly what makes $f_{yx}(0,0)$ positive as well.
It's up to you if you want to venture away from the land of $f=f_x=f_y=0$ at the origin, and see what the result is. The difference is basically adding a function $g(x,y)=ax^2+by^2+cx+dy+e$ to $f$, which you can hopefully see doesn't change $f_{xy}$ and $f_{yx}$.
Solution 2:
One way to think about this is that for nice functions (in this case twice differentiable) you only need to consider $f$ up to second order, terms of higher order don't have any impact on second derivatives.
So you only need to check this for general quadratic functions $f(x, y) = a x^2 + b y^2 + c x y + d x + e y + g$. In this case you almost immediately see $f_{xy} = f_{yx} = c$.
Solution 3:
As pointed in the comments, this is not always true and the first counter example given was somehow a shock in the mathematical world. You should look up Schwarz's theorem for this. However, I think that the way to think about it is to simply say that if a function has many derivatives, then you have some kind of regularity around a point. Your derivative $f_x$ or $f_y$ is restricted in the rate of growth in a way that no matter what direction x or y you approach a point, it is always in a smooth way.