What exactly is the difference between a derivative and a total derivative?
Solution 1:
The key difference is that when you take a partial derivative, you operate under a sort of assumption that you hold one variable fixed while the other changes. When computing a total derivative, you allow changes in one variable to affect the other.
So, for instance, if you have $f(x,y) = 2x+3y$, then when you compute the partial derivative $\frac{\partial f}{\partial x}$, you temporarily assume $y$ constant and treat it as such, yielding $\frac{\partial f}{\partial x} = 2 + \frac{\partial (3y)}{\partial x} = 2 + 0 = 2$.
However, if $x=x(r,\theta)$ and $y=y(r,\theta)$, then the assumption that $y$ stays constant when $x$ changes is no longer valid. Since $x = x(r,\theta)$, then if $x$ changes, this implies that at least one of $r$ or $\theta$ change. And if $r$ or $\theta$ change, then $y$ changes. And if $y$ changes, then obviously it has some sort of effect on the derivative and we can no longer assume it to be equal to zero.
In your example, you are given $f(x,y) = x^2+y^2$, but what you really have is the following:
$f(x,y) = f(x(r,\theta),y(r,\theta))$.
So if you compute $\frac{\partial f}{\partial x}$, you cannot assume that the change in $x$ computed in this derivative has no effect on a change in $y$.
What you need to compute instead is $\frac{\rm{d} f}{\rm{d}\theta}$ and $\frac{\rm{d} f}{\rm{d} r}$, the first of which can be computed as:
$\frac{\rm{d} f}{\rm{d}\theta} = \frac{\partial f}{\partial \theta} + \frac{\partial f}{\partial x}\frac{\rm{d} x}{\rm{d} \theta} + \frac{\partial f}{\partial y}\frac{\rm{d} y}{\rm{d} \theta}$
Solution 2:
I know this answer is incredibly delayed; but just to summarise the last post:
If I gave you the function
$$ f(x,y) = \sin(x)+3y^2$$
and asked you for the partial derivative with respect to $x$, you should write:
$$ \frac{\partial f(x,y)}{\partial x} = \cos(x)+0$$
since $y$ is effectively a constant with respect to $x$. In other words, substituting a value for $y$ has no effect on $x$. However, if I asked you for the total derivative with respect to $x$, you should write:
$$\frac{df(x,y)}{dx}=\cos(x)\cdot {dx\over dx} + 6y\cdot {dy\over dx}$$
Of course I've utilized the chain rule in the bottom case. You wouldn't write $dx\over dx$ in practice since it's just $1$, but you need to realise that it is there :)
Solution 3:
Does everyone agree that the poster arrived at the correct answer?
People write $$\frac{\partial}{\partial t}g(x(t),t)$$ or $$\frac{\text{d}}{\text{d} t}g(x(t),t)$$
The first is typically used to mean "the derivative of function $g$ with respect to the second argument". The second usually means the "total derivative". There are variations on this. Some people omit the arguments and just write, for example, $\frac{\partial}{\partial t}g$
So for example: if $x$ is secretly a function of $t$, then the notation $\frac{d}{dt}f(x,t)$ is called the total derivative and is an abbreviation for the (single-variable derivative) $g′(t)$ where $g(t)=f(x(t),t)$. In applying the chain rule to the last expression, you would need some way to denote "the derivative of f with respect to its first argument" many people would write $\frac{\partial}{\partial x}f$ for this, but in many cases this is confusing as I explain in the example below.
The wide-spread math notation here confuses many people and I think it is pretty much unnecessary to use it. If you want to take a total derivative, construct explicitly the function (like $g$ above) and take a single-variable derivative. Otherwise, the explanations for the difference between total and partial derivatives needs you to make appeals like temporarily fixing variables or saying that a variable is effectively constant or switching between thinking of $x$ as a function and as an expression. These are all fuzzy things you can do successfully once you already feel comfortable with what's going on. But otherwise, it pays to think carefully about what's really happening.
Your example
The problem stems from the conflation of an expression and a function. You did this when you wrote $w = f(x,y) = x^2 + y^2$. In that case, many will write
$\frac{\partial}{\partial x}w$ and
$\frac{\partial}{\partial x} f(x,y)$
(which are equivalent). This sort of makes sense. In both cases, the thing to the right of the differential operator is an expression which contains $x$ and $y$. The thing that is produced by applying that operator is also an expression in the same variables. This is also true of what $\frac{d}{dx}$ means. For the particular expressions above, I would just use that.
The actual purpose of the partial derivative is to take derivatives of functions with respect to one of its arguments, not expressions. That's not what's happening above. That is what's happening when people write:
$\frac{\partial}{\partial x} f$.
$f$ is not an expression. It is a function. I personally do not like this notation. You could have defined an identical $f$ by writing $f(a,b) = a^2 + b^2$. The variables that appear in the definition of a function are, in the strictest sense, invisible to the rest of the world. It's just a convenient way of stating "$f$ is a function that takes two arguments. It squares the first, squares the second, and returns the sum of the squares". Instead of having to write that sentence out (which people had to do before inventing better notation), you can instead give names to the arguments of $f$ so that you can easily refer to them when defining $f$.
But when you write $\frac{\partial}{\partial x} f$, then you are using some knowledge of how you defined $f$---that you chose the name $x$ for the first argument. It can be useful to have names for function arguments instead of just referring to their position (first, second, etc. argument), and so that's why the partial notation survives, but I think the notation needs to improve for this.
What someone typically means when they write $\frac{\partial}{\partial x} f$ is roughly "the function that takes two arguments and returns the sensitivity of $f$ with respect to its first argument". So if you're at some point $(a,b)$ or $(x,y)$ or whatever, and you wiggle the first argument $a$ or $x$, how much does the output of $f$ wiggle? That is the question that the gradient of a function is supposed to answer. This is probably what someone means if they say "normal derivative" They are thinking about only a single function, with possibly multiple arguments. And they are trying to make an object that tells you how sensitive the output of the function is to a change in each of the inputs.
The total derivative usually means that somewhere you've implicitly defined some new functions. In this case, you have made functions $x(r,\theta) = r \sin(\theta)$ and $y(r,\theta) = r \cos(\theta)$, and you can compose these functions, making a new function: $$g(r,\theta) = f(x(r,\theta),y(r,\theta))$$
Notice again, that $r$ and $\theta$ are chosen only to give a human information about connotation of this function. If we processed things purely symbolically, then the definition of $g$ could as well have been
$$g(input_1,input_2) = f(x(input_1,input_2),y(input_1,input_2))$$
And so when the problem asked you to find $\frac{\partial}{\partial r} w$, there are two, in the end identical, interpretation of what that means. Either construct the function $g$ as I did above, and report its sensitivity with respect to the first argument. OR substitute the expressions for $x$ and $y$ into the expression for $w$. Now you have an expression for $w$ in terms of $r$ and $\theta$. I prefer the approach that thinks about functions. This is how we organize code and I think this is how we should organize math. When you deal with expressions, you effectively have a ton of global variables.
So how do we compute $\partial_1 g$, which is just the notation for "make a function with the same arity (number of inputs) as $g$, such that it evaluates the the derivative of the function $g$ with respect to its first argument"? It's just the chain rule.
$$[\partial_1 g](r,\theta) = [\partial_1 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 x](r,\theta) + [\partial_2 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 y](r,\theta)$$
We can see why thinking about things in this way is not popular! But this is the clearest, most mechanical, way to think about it. Otherwise you are relying on implicit punning of $x$ as a function and as an expression. Choose one and stick with it!
Anyway, to simplify the above definition, which didn't care about the definitions of $f$, $x$, or $y$, we need to use the definitions.
$f(x,y) = x^2 + y^2$ and therefore
- $[\partial_1 f](x,y) = 2x$
- $[\partial_2 f](x,y) = 2y$
$x(r,\theta) = r\sin(\theta)$ and therefore
- $[\partial_1 x](r,\theta) = \sin(\theta)$
likewise
- $[\partial_1 y](r,\theta) = \cos(\theta)$
FURTHERMORE, though we don't need it at the moment
- $[\partial_2 x](r,\theta) = r\cdot \cos(\theta)$
- $[\partial_2 y](r,\theta) = -r\cdot \sin(\theta)$
So again, the function is
$$[\partial_1 g](r,\theta) = [\partial_1 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 x](r,\theta) + [\partial_2 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 y](r,\theta)$$
substituting the functions we just computed:
$$[\partial_1 g](r,\theta) = 2x(r,\theta) \cdot \sin(\theta) + 2y(r,\theta) \cdot \cos(\theta)$$
and substituting $x$ and $y$
$$[\partial_1 g](r,\theta) = 2r\sin(\theta) \cdot \sin(\theta) + 2r\cos(\theta) \cdot \cos(\theta)$$
which, after using the very trig identity you used, is
$$[\partial_1 g](r,\theta) = 2r$$
Yet another way to make the same point:
When you see the notation $g'(x)$, you can group that as $[g'](x)$. You've made a new function, called "g prime", which is the derivative of $g$, and you're evaluating it at point $x$. $g'(y)$ means the same thing, except you're evaluating at the point $y$. The multidimensional analogue of this is $\nabla g(\mathbf{x})$. You should parse that as $[\nabla g](\mathbf{x})$.
This is not the case with the notation $\frac{d}{dx} g(x)$. If you parse that as $[\frac{d}{dx} g](x)$, you get confused because what does $x$ mean in the scope of the brackets? You don't have to give meaning to it because it should be meaningless. The operator $\frac{d}{dx}$ applies to an expression, not a function.
But, what people will routinely do is define
$g(x)= x^2+sin(x)+\text{whatever expression in }x$
and then write $\frac{d}{dx} g(y)$ when they really should have written $g'(y)$. They don't do this very often in the single-variable case, but they do it in the multi-variable case. I just showed the single-variable case because it's clearer to see the problem with it.
My inspiration for this answer comes from http://groups.csail.mit.edu/mac/users/gjs/6946/sicm-html/book-Z-H-78.html#%_sec_Temp_453)
Solution 4:
I find some of the answers (and comments) above to be a bit confusing. I want to address some of the issues brought up. The original question of the OP was to find the total derivative $ \frac{dw}{dr} $ for the function: $$w=f(x,y)= x^2 + y^2,~~ x=r \sin \theta ,~~y = r \cos \theta $$ assuming that $r, \theta $ are independent variables.
On the face of it finding $ \frac{dw}{dr} $ is not possible if $r, \theta $ are independent of each other.
It is true that $$ \frac{\partial w}{\partial r }= 2r$$ Proof: $$ \frac{\partial w}{\partial r } = \frac{\partial w}{\partial x } \frac{\partial x }{\partial r} + \frac{\partial w}{\partial y } \frac{\partial y }{\partial r} $$ Plugging in $$ \frac{\partial w}{\partial r } = 2x~ ( \sin \theta ) + 2y ( \cos \theta ) $$ Substituting using the given $x $ and $y $ equations $$\frac{\partial w}{\partial r } = 2( r \sin \theta ) ~ ( \sin \theta ) + 2( r \cos \theta ) ( \cos \theta ) = 2r ( \sin^2 \theta + \cos^2 \theta ) = 2r $$
We can relax the assumption that $r$ and $\theta $ are independent of each other to find $ \frac{dw}{dr} $. The computation is quite a bit more involved. We would have to temporarily assume that $\theta$ is a function of $r$. $$ \frac{dw}{dr} =\frac{\partial w}{\partial x } \frac{\partial x}{\partial r } \frac{dr}{dr }+ \frac{\partial w}{\partial x } \frac{\partial x}{\partial \theta }\frac{d\theta }{ dr }+\frac{\partial w}{\partial y } \frac{\partial y}{\partial r } \frac{dr}{dr }+ \frac{\partial w}{\partial y } \frac{\partial y}{\partial \theta }\frac{d\theta }{ dr } $$ It is true that an early substitution gives us $$ w= ( r \sin \theta) ^2 + ( r \cos \theta)^2 = r^2$$ But it would be misleading to state that $$ \frac{dw}{dr}= 2r $$ since $$ w = f( r, \theta) $$ An analogous scenario is found when we want the slope at a point on the surface $$ z=f(x,y) = x^2$$ We would still use partial derivative even though we have $$ \frac{\partial z}{\partial x } = \frac{dz}{dx} = 2x$$ We can be more explicit and define $$z=f(x,y) = x^2 + 0y$$
Solution 5:
It may be easier to imagine a figure with orthogonal x and y co-ordinates for the base and a functional result (i.e. some function, w, of two variables x and y) plotted as a surface on the vertical z axis. If we look at the result for a change in the function that we obtain when we keep y constant and let x vary, it is a tangent to the surface of slice taken through that surface parallel to the x axis. Of course you get an equivalent picture for letting y vary but keeping x constant. Now imagine a change in the function but we are letting both x and y vary simultaneously, delta w is the change in w and basically we sum the changes in the function to get delta w, del w =f(x + delx, y + del y) - f(x,y) if we expand for del w and go to the limit we get for dw = the partial derivative wrt x times dx plus the partial derivative wrt y times dy. If x and y are both functions of a single variable t, then so is w, and we can divide each term by dt which is the total derivative wrt to t of the function.
This is the classic example of the basic concepts and you can find a version of it here:-
https://www.math.uwaterloo.ca/~ahamadeh/math217_p2.pdf
For a good illustrative example I like the rate of change of an expanding cylinder volume, V is given by V= (PI)(r^2)(h), r = radius h = height, now use the previous idea expression for delta w, but here its delta V, divide both sides by delta t and then let delta t go to zero.
It only gets slightly more complicated when we use the methodology to find differential coefficients of implicit functions, but we use similar methods. Textbook examples often let z stand for the function of x and y, then you just form delta z (as 'normal') divide both sides by delta x and let delta x go to zero giving an expression for dz/dx. Often you are given info about z, may be z = 0 (constant) so dz/dx = 0.
If you tack on to these ideas the change of variable idea, which is a bit more of the same really, e.g. say z is a function of x and y z = f(x,y) and x and y are in turn functions of two other variables u and v , then z is a function of u and v, so you form delta z as 'normal' in terms the partial diff of z wrt x times delta x plus the partial diff of y times delta y, divide both sides by delta u and let delta u got to zero, v is being kept constant for the time being. That gives you the partial differential of z wrt u, and you follow the same procedure to get an expression for the partial differential of z wrt v .
With these you have most of the basic tools to handle such problems, and I suppose the basic answer to your question is that you have an implicit function, so when you want the change in the function when x is varying you have to add on the 'extra' bit. The concept pops up in a number of places, may be your function describes the temperature of a volume element, its cooling with time, but its also moving and the spatial co-ordinates bring it near a heat source, so you have to add the two effects. So if you are not careful you miss that 'extra' spatial term and just have the 'pure' time term.
Apologies for the wordiness of this reply, but just maybe a half reasonable idea of what is going on has been conveyed (for time I'd imagine the whole function surface altering shape with time in 3 D perhaps we can only have snapshots in time.).