What's my confusion with the chain rule? (Differentiating $x^x$)

When deriving $x^x$, why can't you choose $u$ to be $x$, and find $\dfrac{d(x^u)}{du} \dfrac{du}{dx} = x^x$? Or you could go the other way and find $\dfrac{d(u^x)}{du}\dfrac{du}{dx}$, giving $\ln(x)\cdot{x^x}$? Both methods seem to be equally wrong.


Solution 1:

Both methods are wrong, but the fix is easy: the solution is the sum of the two proposals, and this is not by coincidence !

Naturally, turning a single instance of $x$ to a constant cannot be the way as that is not symmetric. The correct way is by differentiating on every instance in turn, and is justified by the chain rule with partial derivatives:

$$\frac{df(u,v)}{dx}=\frac{\partial f(u,v)}{\partial u}\frac{du}{dx}+\frac{\partial f(u,v)}{\partial v}\frac{dv}{dx}.$$ In other words, you keep one instance variable while the other remains constant and sum the two cases.

Here, $f(u,v)=u^v$ with $u=v=x$, and

$$\frac{dx^x}{dx}=\frac{du^v}{dx}=vu^{v-1}\cdot1+\ln(u)u^v\cdot1=x^x+\ln(x)x^x,$$ or with a more intuitive notation$$\frac{dx^x}{dx}=\frac{dx^v}{dx}\cdot1+\frac{du^x}{dx}\cdot1=vx^{v-1}+\ln(u)u^x=x^x+\ln(x)x^x.$$


This works with as many instances of $x$ as you like. For instance $x^{x+x^2}$ seen as $u^{v+w^2}$ yields

  • varying the first instance, $(v+w^2)x^{v+w^2-1}$;

  • varying the second instance, $\ln(u)u^{x+w^2}$;

  • varying the third instance, $\ln(u)u^{v+x^2}2x$.

Then globally

$$(1+x+\ln(x)(1+2x))e^{x+x^2}.$$

Solution 2:

If you work with the formal definition of the chain rule, you'll see how what you're trying to do makes no sense.

But if you want to stick with the abuse of notation $\frac{dz}{dx}=\frac{dz}{dy}\frac{dy}{dx}$, I'd say that the heart of the problem is in your claim that $\frac{d(x^u)}{du}=x^u\log x$. This is only valid if $x$ is constant, and doesn't apply if $x$ is a function of $u$ (in our case, $x=u$).

That's the difference between a total derivative $\frac{d}{dt}$ and a partial derivative $\frac{\partial}{\partial t}$. The latter, $\frac{\partial f(s,t)}{\partial s}$, means, "change in $f$ when $s$ changes and nothing else does". Whereas $\frac{df(s,t)}{ds}$ means "change in $f$ when $s$ changes, and everything else changes accordingly". So you can't have $u$ depend on $x$ and calculate a total derivative in a way that assumes $x$ is constant.