Intuition of multivariable chain rule

I was learning/reviewing the chain rule for multivariable calculus and was wondering why the multivariable calculus chain rule is a function of summation of products of derivatives rather than just product of derivatives, like its single variable counter part.

In particular I want to fix how I think of the chain rule and generalize my thoughts/intuition to multiple variables.

Usually the way I used to remember the chain rule was by the usual "trick" of "canceling out" the middle dummy variables i.e. consider y = f(x(t)), then:

$$ \frac{dy}{dt} = \frac{dy}{dx} \frac{dx}{dt} $$

and since the dx's cancel out, the chain rule works! Wohooo, super intuitive, easy to remember and even though its not mathematically rigorous, at least it sort of makes sense.

However, for multiple variables the equation looks very different. Consider $z = f(x(t), y(t) )$, then its chain rule derivative is:

$$ \frac{dz}{dt} = \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial y} \frac{dy}{dt} $$

Even though there is some of the same "canceling" trick, the equation doesn't quite make as much intuitive sense to me or where it came from.

So what is the intuition behind this equation? Why is it a summation of products of derivatives? Anyone have a good way of generalizing the intuition from one variable to multiple variables? Or maybe we have to change our intuition in a significant way and that is fine, as long as its more useful for multiple variable calculus!


Quick comment, by intuition, I don't necessarily mean analogies to physics, but it can be to conceptual ideas in mathematics. So bringing in explanations say of real analysis and linear algebra that appeal the intuition/concepts of those areas are welcome! We all have different type of intuitions. :)


The problem with intuition about cancelling differentials, it isn't safe. And yet, the method of differentials is stupidly successful.

Let me give a standard example of intuitions downfall. First, since partials cancel, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = 1$$ except, it doesn't. Actually, with the right interpretation, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = -1.$$ In particular, we assume $x,y,z$ are related by some level function $F(x,y,z)=0$ then $dF = F_xdx+F_ydy+F_zdz$ thus $$ \frac{\partial z}{\partial y} = \frac{dz}{dy}\bigg{|}_{dx=0} = -\frac{F_y}{F_z}$$ with more words, if we consider $z$ as a function of $x,y$ then the partial derivative of $z$ whilst holding $x$ fixed is $-F_y/F_z$. Notice, I simply take the total differential of $F$ and solve for $dz/dy$ while setting $dx=0$. This is an example of how the differential notation is naively successful (because, careful application of the implicit function theorem yields the same outcome). Likewise, intuitive calculation with $dx,dy,dz$ yields $$ \frac{\partial y}{\partial x} = \frac{dy}{dx}\bigg{|}_{dz=0} = -\frac{F_x}{F_y}$$ $$ \frac{\partial x}{\partial z} = \frac{dx}{dz}\bigg{|}_{dy=0} = -\frac{F_z}{F_x}$$ Thus, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = \left(-\frac{F_y}{F_z}\right)\left(-\frac{F_x}{F_y}\right)\left(-\frac{F_z}{F_x}\right) = -1.$$

Getting back to your posed question. Why are there sums of derivatives? Well, in short, because the multivariate function can change in all of its arguments. As the derivative is a linear approximation to the change in the function we have little hope except to see formulas formed from sums of all the possible things which can change the outcome. This is the multivariate chain rule. It accounts for each entry in an entirely symmetrical manner. Ok, these sort of explainations don't settle well with me. The real answer in my estimation is matrix multiplication. The chain-rules really fall out of multiplication of Jacobian matrices which in turn come from the chain-rule in its pure form $D(F \circ G) = DF \circ DG$. But, perhaps this isn't intuition. That said, it is my intuition.

I'll add a little example to explain how the matrix multiplication works together with the Jacobian matrix to capture the chain rule. Suppose $\vec{X}: \mathbb{R}^2_{uv} \rightarrow \mathbb{R}^3_{xyz}$ and $\vec{F} = \langle P, Q, R \rangle : \mathbb{R}^3_{xyz} \rightarrow \mathbb{R}^3$. Here I use the notation $\mathbb{R}^2_{uv}$ to indicate $u,v$ serve as the coordinates. Here you can think of $\vec{X}$ as a parametrization of a surface and $\vec{F}$ as a vector field in three dimensional space. The composition $\vec{F} \circ \vec{X}$ is commonly considered in the calculation of flux of $\vec{F}$ through the surface parametrized by $\vec{X}$. In this case, the Jacobian of $\vec{X}$ is given by $$ J_{\vec{X}} = \left[ \frac{\partial \vec{X}}{\partial u} |\frac{\partial \vec{X}}{\partial v}\right] = \left[\begin{array}{cc} \partial_u x & \partial_v x \\ \partial_u y & \partial_v y \\ \partial_u z & \partial_v z \end{array} \right]$$ and the Jacobian of $\vec{F}$ is given by $$ J_{\vec{F}} = \left[ \frac{\partial \vec{F}}{\partial x}| \frac{\partial \vec{F}}{\partial y}| \frac{\partial \vec{F}}{\partial z} \right] = \left[ \begin{array}{ccc} \partial_x P & \partial_y P & \partial_z P \\ \partial_x Q & \partial_y Q & \partial_z Q \\ \partial_x R & \partial_y R & \partial_z R \\ \end{array} \right]$$ Setting $\vec{G} = \vec{F} \circ \vec{X}$ we find from the matrix form of the chain rule that: (suppressing point dependence) \begin{align} J_{\vec{G}} &= J_{\vec{F}}J_{\vec{X}} \\ &= \left[ \begin{array}{ccc} \partial_x P & \partial_y P & \partial_z P \\ \partial_x Q & \partial_y Q & \partial_z Q \\ \partial_x R & \partial_y R & \partial_z R \\ \end{array} \right]\left[\begin{array}{cc} \partial_u x & \partial_v x \\ \partial_u y & \partial_v y \\ \partial_u z & \partial_v z \end{array} \right] \\ &= \left[\begin{array}{c|c} \partial_x P\partial_u x +\partial_y P \partial_u y + \partial_z P\partial_u z &\partial_x P\partial_v x +\partial_y P \partial_v y + \partial_z P\partial_v z \\ \partial_x Q\partial_u x +\partial_y Q \partial_u y + \partial_z Q\partial_u z &\partial_x Q\partial_v x +\partial_y Q \partial_v y + \partial_z Q\partial_v z \\ \partial_x R\partial_u x +\partial_y R \partial_u y + \partial_z R\partial_u z &\partial_x R\partial_v x +\partial_y R \partial_v y + \partial_z R\partial_v z \end{array} \right] \end{align} For example, in the $(1,1)$ entry we read off: $$ \frac{\partial G^1}{\partial u} = \frac{\partial}{\partial u} \left[P(x(u,v), y(u,v), z(u,v))\right] = \frac{\partial P}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial P}{\partial y}\frac{\partial y}{\partial u} + \frac{\partial P}{\partial z}\frac{\partial z}{\partial u} $$ Notice the matrix $J_{\vec{G}}$ contains all $6$ interesting chain rules involving composition of the component functions $P,Q,R$ of $\vec{F}$ composed with the component functions $x,y,z$ of $u,v$.


$\frac{dz}{dt}$ measures what changes about $z$ when you change $t$ a little. Since $z = f(x(t), y(t))$, two things to the input of $z$ change when you change $t$ a little: $x$ changes a little, and $y$ changes a little. Both of these changes affect $z$ a little. The first term is the part of the total change in $z$ coming from $x$ changing, and the second term is the part of the total change in $z$ coming from $y$ changing.

Symbolically (writing $\partial (-)$ for a small change in $(-)$):

$$z + \delta z = f(x(t + \delta t), y(t + \delta t)) \approx f(x(t) + \delta x, y(t) + \delta y) \approx f(x(t), y(t)) + \frac{\partial f}{\partial x} \delta x + \frac{\partial f}{\partial y} \delta y$$

where $\delta x \approx \frac{dx}{dt} \delta t$ and $\delta y \approx \frac{dy}{dt} \delta t$.

The canceling trick is a very bad idea; it sort of works for ordinary calculus but as you can see it fails very badly for multivariable calculus. The problem is that partial derivatives do not behave anything like fractions. What they actually behave like are coefficients of a matrix (and the general form of the chain rule involves matrix multiplication); this will become much clearer if you take a linear algebra course.