Help understanding expression for the derivative of a function
Solution 1:
Based on the link you supplied, it seems you're comfortable with the idea that if we have a function $f: \Bbb{R}^n \to \Bbb{R}$ and a point $a \in \Bbb{R}^n$ then the derivative of $f$ at $a$ should itself be a linear mapping $M: \Bbb{R}^n \to \Bbb{R}$ such that \begin{align} f(a+h) - f(a) &= M(h) + o(\lVert h \rVert), \end{align} which in words says that the change in the function at the point $a$; i.e $\Delta f_a(h) := f(a+h) - f(a)$ can be approximated by a linear term $M(h)$ plus some "higher order term" which is small in the sense that it can be written as $\Phi_a(h) \cdot \lVert h\rVert$, where $\lim_{h \to 0} \Phi_a(h) = 0$.
One can prove that if $M$ exists then it is unique, which is why one usually writes the linear map $M$ using the notation $Df(a)$ or $Df_a$, or $f'(a)$, or $df(a)$, or $df_a$... basically anything which reminds you of a derivative. For now let's just use the notation $df_a$. Note that in this definition, $df_a$ is not some "infinitesimally small change in $f$". By definition it is a linear mapping $df_a: \Bbb{R}^n \to \Bbb{R}$, such that the following equation holds: \begin{align} \Delta f_a(h) = f(a+h) - f(a) = df_a(h) + o(\lVert h\rVert). \end{align} In other words, $df_a(h)$ is the first-order approximation (i.e the linear approximation) to the actual change $\Delta f_a(h)$ in the function.
Now, how do we calculate the quantity $df_a(h)$? Well, $h = (h^1, \dots, h^n) \in \Bbb{R}^n$ is a vector so we can expand it using the "standard" basis $e_i = (0, \dots \underbrace{1}_{\text{$i^{th}$ spot}}, \dots 0)$, as \begin{align} h &= \sum_{i=1}^n h^i e_i \end{align} Now, since $df_a$ is a linear map, we can write: \begin{align} df_a(h) &= df_a\left(\sum_{i=1}^n h^i e_i\right) = \sum_{i=1}^n df_a(e_i) \cdot h^i \tag{$*$} \end{align} It is pretty easy to prove from the definitions that $df_a(e_i)$ is exactly the $i^{th}$ partial derivative at $a$: \begin{align} df_a(e_i) &= (\partial_if)_a \equiv \dfrac{\partial f}{\partial x^i}(a) \in \Bbb{R}. \end{align} So, equation $(*)$ can be written as \begin{align} df_a(h) &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}(a) \cdot h^i \tag{$**$} \end{align} Now, how do we write this as an equation without writing $h$ everywhere? Well, consider the projection function $\pi^i: \Bbb{R}^n \to \Bbb{R}$ defined as $\pi^i(h) = h^i$. This is a linear function so the derivative is easy to compute: it is $(d\pi^i)_a(\cdot) = \pi^i(\cdot)$. So, for every $h \in \Bbb{R}^n$, we have \begin{align} (d \pi^i)_a(h) = \pi^i(h) = h^i. \end{align} So, if we plug this into $(**)$, we get \begin{align} df_a(h) &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}(a) \cdot (d\pi^i)_a(h) \end{align} So, we can now remove the $h$ everywhere to get \begin{align} df_a &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}(a) \cdot (d \pi^i)_a \end{align} This is an equality of linear transformations $\Bbb{R}^n \to \Bbb{R}$. Or, if we go one step further, we can suppress the point of derivative evaluation to write simply \begin{align} df &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i} \cdot d \pi^i. \end{align} Or finally, if we decide to modify our notation and call the function $\pi^i$ as $x^i$ instead, then we get the very nice memorable equation \begin{align} df &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i} dx^i \end{align}
Once again, just to reiterate the whole thing: if $f$ is a nice differentiable function, then we can ask "how can I approximate changes $f$?" The answer is if you want to calculate the change $\Delta f_a(h) = f(a+h) - f(h)$, then by definition of differentiability, you know this is well approximated to linear order by the quantity $df_a(h)$. But how do we calculate $df_a(h)$? It's very simple, just write \begin{align} df = \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}dx^i \end{align} and plug in $a$ and $h$ at the appropriate places to get \begin{align} df_a(h) &= \sum_{i=1}^n\dfrac{\partial f}{\partial x^i}(a) \cdot (dx^i)_a(h) \\ &= \sum_{i=1}^n\dfrac{\partial f}{\partial x^i}(a) \cdot h^i \end{align} (here, $a$ is the point where we calculate the derivative, and $h$ is to be thought of as the "displacement vector" from $a$ to $a+h$).
Now, you asked
How is this expression supposed to be understood? The expression above is obviously a measure of the sensitivity to change of the function $U$ but with respect to what?
Just to answer this question directly, you should think of it as a linear approximation to the actual change in the value of a function when you vary its argument slightly from $a$ to $a+h$. In your case for $U$, it is slightly easier because there's only two variables. So, at a point $a = (a_1, a_2)$, and for $h = (h_1, h_2)$, we can write \begin{align} dU_{a}(h_1,h_2) &=\dfrac{\partial U}{\partial x}(a)\cdot dx_a(h) +\dfrac{\partial U}{\partial y}(a)\cdot dy_a(h) \\ &= \dfrac{\partial U}{\partial x}(a)\cdot h_1 + \dfrac{\partial U}{\partial y}(a)\cdot h_2. \end{align} So, once again, this is simply the linear approximation to the actual change in the function $\Delta U_a(h) = U(a+h) - U(a) = U(a_1+h_1, a_2+h_2) - U(a_1,a_2)$.