the proof of several variable calculus chain rule
This is Exercise $6.4.3$ from Tao's Analysis 2.
Let $L_1 = g'(f(x_0))$ and $L_2 = f'(x_0)$. By Proposition $3.1.5$(b), it suffices to show that $\forall \varepsilon > 0, \exists N >0$ such that $||(g \circ f)(x_n) - (g \circ f)'(x_0) - L_1L_2(x_n - x_0)|| \leqslant \varepsilon||x_n - x_0||$, for all $n \geq N$, where $(x_n)$ is any sequence converging to $x_0$.
Since $g$ is differentiable at $y_0 = f(x_0)$ and $f$ at $x_0$, we have, respectively: $||g(y_n) - g(y_0) - L_1(y_n - y_0)|| \leqslant \varepsilon^*||y_n - y_0||, \forall n \geqslant N_1 - (1)$, where $(y_n)$ converges to $y_0$. And $||f(x_n) - f(x_0) - L_2(x_n - x_0)|| \leqslant \varepsilon^*||x_n - x_0||, \forall n \geqslant N_2 - (2)$, where $(x_n)$ converges to $x_0$. By Proposition $3.1.5$(b) and the fact that f is continuous at $x_0$, we can combine $(1)$ and $(2)$, and let $C = (g \circ f)(x_n) - (g \circ f)(x_0)$ to obtain that: $||C - L_1(f(x_n) - f(x_0))|| \leqslant \varepsilon^* ||f(x_n) - f(x_0)||, \forall n \geqslant N = max(N_1, N_2) - (3)$.
By the triangle inequality, LHS of $(3) \geqslant ||C - L_1L_2(x_n - x_0)|| - ||L_1(f(x_n) - f(x_0) - f'(x_0)(x_n - x_0))||$. Hence $||C - L_1L_2(x_n - x_0)|| \leqslant ||L_1(f(x_n) - f(x_0) - f'(x_0)(x_n - x_0))|| + \varepsilon^*||f(x_n) - f(x_0)||, \forall n \geqslant N$.
By Exercise $6.1.4$, this implies that $||C - L_1L_2(x_n - x_0)|| \leqslant M||f(x_n) - f(x_0) - f'(x_0)(x_n - x_0)|| + \varepsilon^*||f(x_n) - f(x_0)||$ for some $M > 0. - (4)$
Again by the differentiability of $f$ at $x_0$, if $n$ is sufficiently large, we can make $||f(x_n) - f(x_0) - f'(x_0)(x_n - x_0)|| \leqslant \frac{\varepsilon|x_n - x_0|}{2M}$, s.t the first term on the RHS of $(4)$ is less than or equal to $\frac{\varepsilon||x_n - x_0||}{2}$. Similarly $||f(x_n) - f(x_0)|| \leqslant (M' + \varepsilon')||x_n - x_0||$ for some $M', \varepsilon' > 0$. If we let $\varepsilon^* = \frac{\varepsilon}{2(M' + \varepsilon')}$, the second term on the RHS of $(4)$ is also less than or equal to $\frac{\varepsilon||x_n - x_0||}{2}$. Hence the RHS of $(4) \leqslant \varepsilon||x_n - x_0||$, and the claim follows.