Why does Continuous Partial Differentiability Imply Total Differentiability?
The function $f:\mathbb{R^2} \mapsto \mathbb{R}$ has a total derivative at a point $x$ if there exists a linear operator $Df(x)(\cdot)$ such that for every $\epsilon >0$ there is a $\delta > 0$ such that if $0 < ||h|| < \delta$, then
$$|f(x+h) -f(h) - Df(x)(h)| < \epsilon||h||.$$
Define the operator as
$$Df(x)(h) = \partial_1f(x_1,x_2)h_1+\partial_2f(x_1,x_2)h_2$$
Now consider the following path from $x = (x_1,x_2)$ to $x+h =(x_1+h_1,x_2+h_2)$:
$$ (x_1,x_2) \rightarrow(x_1+h_1,x_2) \rightarrow(x_1+h_1,x_2+h_2).$$
Using the mean value theorem,
$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\ =|f(x_1+h_1,x_2+h_2) - f(x_1+h_1,x_2) +f(x_1+h_1,x_2)- f(x_1,x_2)-\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\=|\partial_2f(x_1+h_1,\xi)h_2 + \partial_1f(\eta,x_2)h_1 -\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\ \leq|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)||h_1|+|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)||h_2|$$
where $x_1 < \eta < x_1 + h_1$ and $x_2 < \xi < x_2 + h_2.$
Since partial derivatives are continuous at $x = (x_1,x_2)$, there exists $\delta >0 $ such that if $||h|| < \delta$, then
$$|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}},\\|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}}.$$
Applying Cauchy-Schwarz we get
$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2|\\< \sqrt{(\epsilon/\sqrt{2})^2+(\epsilon/\sqrt{2})^2}||h||= \epsilon ||h||.$$
It is straightforward to generalize the proof for $d > 2$.