Consider the map $f\times f: X\times X \mapsto X\times X$. It is continuous (because f is) and surjective (because f is). Consider the orbit of $(x,y)$ under $f$, i.e. the sequence $p_n=(f^n(x), f^n(y))$. Because $X\times X$ is compact, it has a convergent subsequence, hence for any $\epsilon$ (sorry, I hoped to do no epsilons, but can't), there is an $N_0, k>0$, such that $f^{N_0}(x)$ is $\epsilon$ close to $f^{k+N_0}(x)$ and $f^{N_0}(y)$ is epsilon close to $f^{k+N}(y)$, hence $d(f^{k+N_0}(x), f^{k+N_0}(y))$ is $2\epsilon$ close to $d(f^{N_0}(x), f^{N_0}(y))$. Then because $f$ does not expand distances, induction shows the same is true for all $N \geq N_0$. But we want more. We want this to be true for some fixed $N$ (and varying $k$) for all $(x,y)$. That is achievable by compactness as follows:

As we showed (replacing $\epsilon$ by $\epsilon/2$) we have $d(f^{k+N_0}(x), f^{k+N_0}(y))$ is $\epsilon$ close to $d(f^{N_0}(x), f^{N_0}(y))$. Because $f$ does not expand distances (and triangle inequality), this means that for any $(v,w)$ with $d(v,x)<\epsilon/4$ and $d(w,y)<\epsilon/4$ the we have $d(f^{k+N_0}(v), f^{k+N_0}(w))$ is $2\epsilon$ close to $d(f^{N_0}(v), f^{N_0}(w))$. Now cover $X$ with $\epsilon/4$ balls and pick a finite subcover. For any pair of centers $(x_i,y_i)$ of the balls we have some $N_0$ which gives $\epsilon$-closeness, and hence gives $2\epsilon$ closeness on the ball, hence the maximum $M$ of these $N_0$'s gives $2\epsilon$ closeness everywhere.

Now we can prove $f$ is an isometry. To get contradiction, assume you have $(x,y)$ with $d(f(x), f(y))< d(x,y)$, hence for some $\epsilon$ also $d(f(x), f(y))< d(x,y)-2\epsilon$. Then for any $k>0$, we have $d(f^k(x), f^k(y))< d(x,y)-2\epsilon$. Finally we use surjectivity. Take $(a,b)$ with $f^M(a)=x, f^M(b)=y$. Then $d(f^{M+k}(a), f^{M+k} (b))= d(f^k(x), f^k(y))$ is $2\epsilon$ close to $d(f^M (a), f^M(b))=d(x,y)$ for some $k$, contradiction.


In fact more is true. For a compact, metric space, $f$ locally non-expansive $\implies$ $f$ local isometry.

This is theorem 4.2 in the paper linked below.

http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.pjm/1102970464


Here's a different argument. Recall that an $\epsilon$-covering $S$ of a metric space is a set such that for every point $x$ there exists $s\in S$ with $d(x,s)\leq \epsilon.$

Lemma. Let $X$ be a metric space with a finite $\epsilon/4$-cover. If $f:X\to X$ is a non-expansive surjection then $d(f(x),f(y))\geq d(x,y)-\epsilon$ for all $x,y\in X.$

Proof: Set $D=d(x,y)-\epsilon/2.$ Let $S$ be an $\epsilon/4$-cover minimizing the quantity $N(S)=|\{(s_1,s_2)\in S\mid d(s_1,s_2)\geq D\}|.$ Note that $f(S)$ is also an $\epsilon/4$-cover, and whenever $d(s_1,s_2)< D$ we have $d(f(s_1),f(s_2))< D.$ But $N(f(S))\geq N(S),$ so no pair can go from distance $\geq D$ to $<D:$ whenever $d(s_1,s_2)\geq D$ with $s_1,s_2\in S,$ we must have $d(f(s_1),f(s_2))\geq D.$

Pick $s_1,s_2\in S$ with $d(s_1,x),d(s_2,y)\leq \epsilon/4$; this gives $d(s_1,s_2)\geq d(x,y)-\epsilon/2,$ which we argued implies $d(f(s_1),f(s_2))\geq d(x,y)-\epsilon/2.$ So $d(f(x),f(y))\geq d(x,y)-\epsilon.$ $\Box$

To get the result in the question, note that a compact space is totally bounded i.e. has a finite $\epsilon$-covering for each $\epsilon.$