Spivak's Chain Rule Proof (Image of proof provided)
If $g$ is differentiable at $a$, and $f$ is differentiable at $g(a)$, then $f \circ g$ is differentiable at $a$, and $$ (f \circ g)^{\prime}(a)=f^{\prime}(g(a)) \cdot g^{\prime}(a). $$ Define a function $\phi$ as follows: $$ \phi(h)= \begin{cases}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}, & \text { if } g(a+h)-g(a) \neq 0 \\ f^{\prime}(g(a)), & \text { if } g(a+h)-g(a)=0 .\end{cases} $$ It should be intuitively clear that $\phi$ is continuous at $0:$ When $h$ is small, $g(a+h)-g(a)$ is also small, so if $g(a+h)-g(a)$ is not zero, then $\phi(h)$ will be close to $f^{\prime}(g(a)) ;$ and if it is zero, then $\phi(h)$ actually equals $f^{\prime}(g(a))$, which is even better. Since the continuity of $\phi$ is the crux of the whole proof we will provide a careful translation of this intuitive argument.
We know that $f$ is differentiable at $g(a) .$ This means that $$ \lim _{k \rightarrow 0} \frac{f(g(a)+k)-f(g(a))}{k}=f^{\prime}(g(a)). $$ Thus, if $\varepsilon>0$ there is some number $\delta^{\prime}>0$ such that, for all $k$, $$ \text{if $0<|k|<\delta^{\prime}$, then $\left|\frac{f(g(a)+k)-f(g(a))}{k}-f^{\prime}(g(a))\right|<\varepsilon$}. \tag{1} $$ Now $g$ is differentiable at $a$, hence continuous at $a$, so there is a $\delta>0$ such that, for all $h$, $$\text{ if $|h|<\delta$, then $|g(a+h)-g(a)|<\delta^{\prime} .$}\tag{2}$$ Consider now any $h$ with $|h|<\delta .$ If $k=g(a+h)-g(a) \neq 0$, then $$ \phi(h)=\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}=\frac{f(g(a)+k)-f(g(a))}{k} ; $$ it follows from $(2)$ that $|k|<\delta^{\prime}$, and hence from (1) that $$ \left|\phi(h)-f^{\prime}(g(a))\right|<\varepsilon. $$
(transcribed from this screenshot)
Here is a proof of the chain rule in Spivak's Calculus. Note there is a second page, but I understand it, and this is the meat of the proof. I have a few questions.
$\textbf{1.}$ "It should be intuitively clear that $\phi$ is continuous at $0$." Do we care that it is continuous at zero so we will not have a division by zero since $g(a+h)-g(a)$ is in the denominator and could equal zero? I am not sure I understand why it is continuous at zero. I understand what he was saying but I was always under the impression continuity was when there were no breaks in the graph visually. Here, I am imagining $\phi(h)$ being continuous up to zero, then it jumping to another point when it is zero.
$\textbf{2.}$ At (2),I do not understand what we are trying to do here. We randomly switched to $h$ and are defining continuity I think. The switch back and forth from $k$ to $h$ is confusing me.
Solution 1:
The "intuitively clear" fact is that there is no visual break in the graph of $\phi(h)$. Sure, the graph of $$ \phi_1(h) = \frac{f(g(a + h)) - f(g(a))}{g(a + h) - g(a)} $$ has a "hole" where $h = 0$, and depending on the other values of $g(a+h)$, there may be additional holes or even entire intervals of the $x$-axis that have no value of $\phi_1(h)$. (Basically, whenever $g(a+h) = g(a)$, there is no value of $\phi_1(h)$.) But the only way to approach one of those "holes" is for the graph of the function to come right up to (or down to) the horizontal line that graphs the constant function $\phi_2(h) = f'(g(a))$. Every "hole" in $\phi_1(h)$ begins and ends on that line, and the second half of the definition of $\phi$ fills in each of those holes with exactly the function value that will connect all the pieces of the graph, namely the value $f'(g(a))$.
For the second part of your question, yes, all the business with statements $(1)$ and $(2)$ is directly using the epsilon-delta definition of continuity. But it requires two application of the definition, logically connected to each other, so we can't just use the symbols $\varepsilon$ and $\delta$ both times--the "epsilon" from one application of the definition is the "delta" for the other application.
In order to keep the symbols unambiguous, the proof uses $\varepsilon$ and $\delta'$ for the "epsilon" and "delta" in statement $(1)$, and it uses $\delta'$ and $\delta$ for the "epsilon" and "delta" in statement $(2)$.
You do have to keep track of what $h$ is versus what $k$ is. I think the trickiest part is near the end, in the sentence that starts, "If $k = g(a + h) - g(a) \neq 0$". By that time we have the condition $0 < \lvert h \rvert < \delta$, which guarantees that we don't produce any $k$ that violate $0 < \lvert k \rvert < \delta'$ this way, but we don't necessarily produce every value of $k$ that would satisfy that condition (which is OK; we don't need to do that). Also, we don't necessarily use every value of $h$ such that $0 < \lvert h \rvert < \delta$: any $h$ for which $g(a + h) - g(a) = 0$ has no corresponding value of $k$; instead, it produces one of the values of $\phi(h)$ that is already at the limit we're trying to show. Yes, this is complicated, and maybe that contributes to the opinions expressed in some other answers and comments that you might prefer to look at someone else's proof.
Solution 2:
I think you need to understand the reason behind the introduction of the function $\phi(h)$. Note that the number $(f\circ g)'(a)$ is defined by $$(f\circ g)'(a) = \lim_{h \to 0}\frac{f(g(a + h)) - f(g(a))}{h}$$ and this can be written as $$\lim_{h \to 0}\frac{f(g(a + h)) - f(g(a))}{g(a + h) - g(a)}\cdot\frac{g(a + h) - g(a)}{h}$$ provided that $g(a + h) - g(a) \neq 0$ as $h \to 0$. When $g(a + h) - g(a) = 0$ then we have a problem and the function $\phi(h)$ is invented to help solve this particular problem.
The function $\phi(h)$ ensures that $$f(g(a + h)) - f(g(a)) = \phi(h)\{g(a + h) - g(a)\}$$ for all values of $h$ near $0$ (this is checked easily). Next note that the above equation implies that $$\lim_{h \to 0}\frac{f(g(a + h) - f(g(a))}{h} = \lim_{h \to 0}\phi(h)\lim_{h \to 0}\frac{g(a + h) - g(a)}{h}$$ provided that $\lim_{h \to 0}\phi(h)$ exists. Clearly the second limit on RHS is $g'(a)$ and thus in order to prove chain rule we must ensure that $$\lim_{h \to 0}\phi(h) = f'(g(a))$$ Note that $\phi(0) = f'(g(a))$ by definition of $\phi(h)$ and hence we need to ensure that $\phi(h) \to \phi(0)$ as $h \to 0$. And therefore we need to ensure that $\phi(h)$ is continuous at $h = 0$. This answers your first query.
Coming to the second query regarding the use of $h$ and $k$, note that the use of variable $k$ is not necessary here. We need to show that $\phi(h) \to \phi(0) = f'(g(a))$ and for this Spivak takes a number $\epsilon > 0$ and tries to find a $\delta > 0$ such that $|\phi(h) - f'(g(a))| < \epsilon$ whenever $0 < |h| < \delta$. Note that that the definition of $\phi(h)$ is complicated and based on the value of the difference $g(a + h) - g(a)$ and this difference he denoted by $k$. Thus $$\phi(h) = \frac{f(g(a) + k) - f(g(a))}{k}$$ if $k \neq 0$ and $\phi(h) = f'(g(a))$ if $k = 0$. Note that when $k = 0$ then $|\phi(h) - f'(g(a))| = 0$ and hence it is automatically less than $\epsilon$ whatever the value of $h$. The problem is to ensure $|\phi(h) - f'(g(a))| < \epsilon$ when $k = g(a + h) - g(a) \neq 0$. To ensure this Spivak uses the differentiability of $f$ at $g(a)$ (which leads to genesis of $\delta' > 0$) and the continuity of $g$ at $a$ (which leads to genesis of $\delta$ based on $\delta'$). Thus we see that the variable $k$ is used for notational convenience and to establish some desired inequality in two steps (find $\delta'$ based on $\epsilon$ and $\delta$ based on $\delta'$).
Solution 3:
Just to provide an additional point, one can understand Spivak's $\phi$ function as actually being a composite function. In fact, in Chapter 6 Exercise 12a, we proved a useful lemma that reads as follows:
If $f$ is continuous at $l$ and $\displaystyle \lim_{x \to a} g(x) = l$, then $\displaystyle \lim_{x\to a}f(g(x))=f(l)$
Although Spivak's $\phi$ function looks a little exotic, consider the slightly more digestible function $\psi$ which we define as follows:
$$\psi(k)= \begin{cases} \frac{f\left(g(a)+k\right)-f\left(g(a)\right)}{k} & k \neq 0 \\ f'\left(g(a)\right)& k = 0 \end{cases}$$
For the $k \neq 0$ condition, we recognize this as the expression that we would to the right of the $\displaystyle \lim_{k \to 0}$ in the definition of the derivative of $f$ at $g(a)$. By assumption, we know that the derivative of $f$ at $g(a)$ exists. With this information, we can actually prove that $\psi$ is continuous at $0$, which means that $\displaystyle \lim_{k \to 0}\psi(k)=\psi(0)=f'\left(g(a)\right). \quad \dagger$
Next, consider the function $\omega$ which we will define as follows:
$$\omega(h)=g(h+a)-g(a) \text{ for any }h\in \mathbb R$$
By assumption, $g$ is differentiable at $a$...which means $g$ is also continuous at $a$. With this information, we can prove that $\displaystyle \lim_{h \to 0} \omega(h)=0. \quad \dagger \dagger$
Notice how using $\dagger$ and $\dagger \dagger$, we can invoke our lemma which states:
$$\displaystyle \lim_{h \to 0}\psi\left( \omega(h) \right)=\psi(0)=f'\left(g(a)\right)$$
We can unpack $\psi\left ( \omega(h)\right)$ in the following way:
$$\psi\left ( \omega(h)\right)=\psi \left( g(h+a)-g(a)\right)$$
By definition of $\psi$, if $g(h+a) - g(a) = 0$, then $\psi \left (g(h+a) - g(a) \right)=f'\left(g(a)\right) $
If $g(h+a) - g(a) \neq 0$, then $\psi \left (g(h+a) - g(a) \right)=\frac{f\left(g(a)+g(h+a) - g(a)\right)-f\left(g(a)\right)}{g(h+a) - g(a)} = \frac{f\left(g(h+a)\right)-f\left(g(a)\right)}{g(h+a) - g(a)}$.
This is precisely how Spivak defined his $\phi$ function.