Why does Continuous Partial Differentiability Imply Total Differentiability?

The function $f:\mathbb{R^2} \mapsto \mathbb{R}$ has a total derivative at a point $x$ if there exists a linear operator $Df(x)(\cdot)$ such that for every $\epsilon >0$ there is a $\delta > 0$ such that if $0 < ||h|| < \delta$, then

$$|f(x+h) -f(h) - Df(x)(h)| < \epsilon||h||.$$

Define the operator as

$$Df(x)(h) = \partial_1f(x_1,x_2)h_1+\partial_2f(x_1,x_2)h_2$$

Now consider the following path from $x = (x_1,x_2)$ to $x+h =(x_1+h_1,x_2+h_2)$:

$$ (x_1,x_2) \rightarrow(x_1+h_1,x_2) \rightarrow(x_1+h_1,x_2+h_2).$$

Using the mean value theorem,

$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\ =|f(x_1+h_1,x_2+h_2) - f(x_1+h_1,x_2) +f(x_1+h_1,x_2)- f(x_1,x_2)-\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\=|\partial_2f(x_1+h_1,\xi)h_2 + \partial_1f(\eta,x_2)h_1 -\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\ \leq|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)||h_1|+|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)||h_2|$$

where $x_1 < \eta < x_1 + h_1$ and $x_2 < \xi < x_2 + h_2.$

Since partial derivatives are continuous at $x = (x_1,x_2)$, there exists $\delta >0 $ such that if $||h|| < \delta$, then

$$|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}},\\|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}}.$$

Applying Cauchy-Schwarz we get

$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2|\\< \sqrt{(\epsilon/\sqrt{2})^2+(\epsilon/\sqrt{2})^2}||h||= \epsilon ||h||.$$

It is straightforward to generalize the proof for $d > 2$.