Prove that the composition of differentiable functions is differentiable.

Prove that the composition of differentiable functions is differentiable. That is, if $f$ is differentiable at $z$, and if $g$ is differentiable at $f (z)$, then $g\circ f$ is differentiable at $z$.

My attempt: I begin with $g(f(z+h))−g(f(z))=[g'(f(z))+\epsilon $]$[f(z+h)−f(z)]$ where $\epsilon→0$ as $h → 0$ Does anyone could help me with this exercise?


This is incredibly easy to prove if you have the following result:

If a function $f$ is differentiable at $a$ then there exists a continuous function $\varphi$ defined on an interval $[-\epsilon,\epsilon]$ such that $\varphi(0)=0$ and

$$ f(a+h) = f(a) + f'(a)h + \varphi(h)h, $$

for all $h \in (-\epsilon,\epsilon)$.

And if such a continuous $\varphi$ exists such that

$$ f(a+h) = b + \alpha h + \varphi(h)h, $$

for all $h \in (-\epsilon,\epsilon)$, then $f$ is differentiable in $a$ with $f'(a) = \alpha$.

The chain rule follows by direct computation: $(g \circ f)(a+h) = g(f(a+h))$, use that $f$ is differentiable to write $f(a+h)$ as $f(a) + f'(a)h + \varphi_f(h)h$, and then call "$f'(a)h + \varphi_f(h)h$" for $k$ and use that $g$ is differentiable.

There's a little bit of bookkeeping needed to make sure that there do exist appropriate intervals around $0$ for the auxillary continuous functions, but it's not too bad.

The best part about this proof is that it immediately generalizes to functions from $\mathbb R^m$ to $\mathbb R^n$.


Define $$h(y)=\begin{cases} \frac{g(y)-g(y_0)}{y-y_0} &\text{if $y \neq y_0$}\\ g'(y_0) &\text{if $y=y_0$.}\end{cases}$$ Then you can write the difference quotient for $g \circ f$ at $x_0$ (with $y_0=f(x_0)$, of course) and pass to the limit as $x \to x_0$. The "trick" is that $h$ is continuous at $y_0$. Please notice that this proof is totally equivalent to the proof you suggest. Historically, it relies on Weierstrass' definition of the derivative: the function $f$ is differentiable at $x_0$ if there exists a function $\omega$, continous at $x_0$, such that $$f(x)=f(x_0)+\omega(x)(x-x_0)$$ for every $x$ in some neighborhood of $x_0$. With this (equivalent) definition, the chain rule is just the continuity of the composition.