Solution 1:

Here is what I think of as the standard explanation. I learned this from The Art of Computer Programming, Section 4.5.3.

Start with some number $z$ whose continued fraction we want to find. I'll review the continued fraction algorithm: Let $a_1 = \lfloor z \rfloor$ and let $r_1 = x-a_1$. Then let $a_2 = \lfloor r_1^{-1} \rfloor$ and let $r_2 = r_1^{-1} - a_2$. Continue in this way to define $r_k$ and $a_k$. The $a_k$ are the coefficients of the continued fraction. Our first task will be to understand the $r_k$.

If $z$ is chosen randomly, then $r_1$ is equally likely to be anywhere in $[0,1)$. But the same is not true of $r_2$. We have $r_2 \in [0,x]$ if and only if $r_1 \in \bigcup_{k=1}^{\infty} [(k+x)^{-1}, k^{-1}]$, so the probability that $r_2$ is in $[0,x]$ is $\sum_{k=1}^{\infty} \left( \tfrac{1}{k} - \tfrac{1}{k+x} \right)$. More generally, let $F_n(x)$ be the probability that $r_n \in [0,x]$, then $$F_{n+1}(x) = \sum_{k=1}^{\infty} \left( F_n(\tfrac{1}{k}) - F_n(\tfrac{1}{k+x} ) \vphantom{\sum} \right).$$

If we imagine that $\lim_{n\to \infty} F_n(x)$ will exist, then the limiting function $F$ should obey $$F(x) = \sum_{k=1}^{\infty} \left( F(\tfrac{1}{k}) - F(\tfrac{1}{k+x} ) \vphantom{\sum} \right).$$

One function that obeys this property is $\log (1+x)$. Let's verify this: $$ \sum_{k=1}^{N} \left( \log \frac{k+1}{k} - \log \frac{k+x+1}{k+x} \right) \sum_{k=1}^{N} \left( \log (k+1) - \log k - \log (k+x+1) + \log (1+x) \right).$$ This telescope to $$\log (N+1) - \log 1 - \log (N+x+1) + \log(1+x) = \log \frac{N+1}{N+x+1} + \log (1+x).$$ Sending $N \to \infty$, we get $\log (1+x)$. This argument was neutral as to what the base of the logarithm was but, since we want $F(1)=1$, we should take log base two.

Of course, I am not going to prove that the limit exists or that it takes this value -- this is a hard theorem! But let's see why it implies the rest. The probability that $a_k = b$ is the probability that $r_{k-1} \in (\tfrac{1}{b+1}, \tfrac{1}{b}]$, or $F_{k-1}(\tfrac{1}{b}) - F_{k-1}(\tfrac{1}{b+1})$. If $F_k$ approaches $\log_2 (1+x)$, then the probability that $a_k$ is $b$ approaches $\log_2 \tfrac{b+1}{b} - \log_2 \tfrac{b+2}{b+1} = \log_2 \tfrac{(b+1)^2}{b(b+2)}$. Note that this is $\approx \tfrac{1}{b^2}$.

Intuitively then, if $g$ is a function from $\mathbb{Z}_{>0}$ to $\mathbb{R}$, we would expect the average value of $g(a_k)$ to be $\sum_{b=1}^{\infty} g(b) \log_2 \tfrac{(b+1)^2}{b(b+2)}$. It wouldn't be a good idea to take $g(x) = x$, since then the sum would diverge. But, if $g(x)=\log x$, then the sum converges to the log of the Khinchin constant, as one would hope for.

There is a subtle switch in the last paragraph. Before the last paragraph, $z$ was a random variable, and I looked at how the probability distribution of $a_k$ behaved as $k \to \infty$. In the last paragraph, and in Khinchin's theorem, I took a single $z$ and asked about the average behavior of $a_k$ for that $z$. This is a hard issue, but ergodic theory has standard tools to justify it.

Solution 2:

If you want a geometric explanation of the result, the existence of Khinchin's constant is a consequence of the erodicity property of the geodesic flow on the modular surface (more precisely, on the modular orbifold).

See

Robert Hines, Simple continued fractions and the geodesic flow on the modular surface

for details: A direct proof of ergodicity is given on page 3 (takes about half a page). On page 4, the existence of Khinchin's constant is derived as a one-line calculation using a suitably chosen measurable function (it relies on Birkhoff's ergodic theorem).

@Paul Plummer's guess about train-tracks was a close miss: The theory of train tracks is a combinatorial tool to describe measured geodesic laminations on surfaces. The modular orbifold carries no geodesic laminations whatsoever, but if you drop the non-intersection property in the definition of geodesic laminations you arrive to the notion of geodesic currents, i.e. flow-and-flip-invariant measures on the unit tangent bundle. [Incidentally, afik, there is no known combinatorial tool, similar to train-tracks in the lamination case, which can be used to study general geodesic currents on hyperbolic surfaces. The continued fractions can be regarded as such a combinatorial tool in the very special case at hand.]

The Liouville measure is one such geodesic current. The ergodicity of the geodesic flow with respect to this measure (due to Hopf) gets you to the answer (the existence of Khinchin's constant).

A nice self-contained exposition of geodesic laminations and currents can be found in these lecture notes:

Javier Aramayona and Christopher J. Leininger, Hyperbolic structures on surfaces and geodesic currents.

Ergodicity of the geodesic flow with respect the to Liouville measure holds in much greater generality, say, for strictly negatively curved manifolds/orbifolds of finite volume (this is due to Hopf). There are many other nice connections of the ergodic theory on locally homogeneous spaces ("homogeneous dynamics") and the number theory, but that would be a separate question, more suitable for Mathoverflow...