Variation in density of the Farey sequence in the interval $[0,1]$
I am curious about how the Farey sequence of order $M$ — which is to say, the sequence of rational numbers from $0$ to $1$, which may be expressed with denominator at most $M$ — is distributed on the real line; and in particular how the 'density' of the Farey sequence varies in this interval as $M \to \infty$.
Consider the following diagram. For varying $M$, we plot the members of the Farey sequence of order $M$. Of course, they are not evenly distributed, and as $M$ increases we see some very obvious clustering behaviour (e.g., among the lowest and highest elements of the sequence, albeit excluding $0$ and $1$ themselves).
Because the Farey sequences (for any given $M$) are finite, clearly we cannot ask meaningful questions literally concerning their 'density' at $x$. But we can consider related concepts — for example, it might be meaningful to consider how many rationals with denominator bounded by $M$, there are in the closed interval $[x\!-\!\tfrac{1}{2M},x\!+\!\tfrac{1}{2M}]$. As $M$ increases and the width of that interval decreases, we approach questions of density or measure. (And in any case, we can make statements about how the number of bounded-denominator rationals vary spatially in such neighborhoods of non-zero width.)
It is easy to imagine a range of similar questions, which in some cases might admit an analytic or combinatorial solution, or which might give rise to 'density' plots which in an asymptotic limit converge to some more-or-less cauliflowery fractal. Other big-picture quantitative perspectives on the variation of density of bounded-denominator rationals might also be available.
What can be said, or what is known, about the 'density' of Farey sequences of order $M$, as $M \to \infty$?
(N.B. This question has been edited to reflect the terminology suggested in the comments, but is otherwise unchanged.)
Solution 1:
My original question is trying to get at some concepts of 'density', which may be contrasted with some coarse-grained results about the asymptotic distribution of the Farey sequence, as indicated in the comments to the question.
The following summarizes what can be said about the asymptotic distribution of the Farey sequence, which is fairly well-known (if slightly poorly documented on Wikipedia article on the Farey sequence). I post this in order to unpack the observations made in the comments, and also to contrast with some elementary observations about the idea which I would like to explore, which is about finer-grained notions of density which may be pertinent for finite $M$. For the time being I have no answers regarding that more precise notion. In the mean time, casual mathematicians such as I am may find the better-known, coarse-grained notions of "the distribution of the Farey sequence" of interest.
Coarse-grained distribution
As pointed out by @GerryMyerson in the comments, there is a meaningful sense in which, despite first appearances, the Farey sequences approach a near-uniform distribution over the unit interval. (Indicative results are mentioned in the Wikipedia article on the Farey sequence, but a number of the results below are taken from Leodan [1] with minor changes in notation; results without citations are ones which Ref. [1] seems to represent as classic.)
I. First-order analysis
From the scaling of the length of the Farey sequence $F_M$ of order $M$, $$ \lvert F_M \rvert = \frac{3 M^2}{\pi^2} + O(M \log M) $$ it immediately follows that $N_M := \lvert F_M \rvert$ is the average density of the Farey sequence over the entire interval $[0,1]$. Remarkably, to first order, this is also the density over any interval of constant size as well. For an interval $I = [\alpha,\beta] \subseteq [0,1]$, if we let $$ N_{M,I} = \lvert F_M \cap I \rvert $$ it then follows that $$ \delta_{M,I} := \frac{N_{M,I}}{N_M} = \bigl(\beta - \alpha\bigr)\Bigl(1 + O\bigl(\tfrac{\log M}{M}\bigr)\Bigr) + O(M^{-1+\epsilon}) $$ for some (all?) $0 < \epsilon < 1$. That is to say, up to minor corrections in $M$, it depends only on the measure of the interval $I$. In this sense, on any interval $I \subseteq [0,1]$ of constant size, the Farey sequence $F_M$ converges relatively quickly to an approximately uniform distribution as $M \to \infty$.
II. On the discrepancy of the coarse-grained Farey sequence from first order distribution
One can ask just how close the Farey sequence is to being uniform over intervals of constant width. Generalising $\delta_{M,I}$ to arbitrary intervals $I \subseteq [0,1]$, let $$ R_M(x) = \bigl\lvert \delta_{M,(0,x]} - x \bigr\rvert $$ be the 'local discrepancy' of the Farey sequence from uniform, on the interval $I = (0,x]$ for $0 \leqslant x \leqslant 1$. Then, we may define $$ \mathbf D_M := \sup_{0 \leqslant x \leqslant 1} R_M(x) $$ which measures the supremum of all 'local discrepancies' of the Farey sequence from uniform, over $(0,x]$ in the unit interval. Remarkably, Dress [2] showed the tight result $ \mathbf D_M = 1/M $: then for these constant-width intervals $(0,x]$, the deviation of the density from uniform can be bounded quite precisely.
Beyond this — for instance, whether deviations of magnitude 'close' to $1/M$ are rare, or whether instead the distribution tends to be much closer to uniform — it is difficult to say more without running into important unproven conjectures.
Define the quantitites $$\begin{align*} S_M &:= \!\! \sum_{1 \leqslant j \leqslant N_M} R^2_M(f_{\!M,j}) \\[3ex] T_M &:= \!\! \sum_{1 \leqslant j \leqslant N_M} R_M(f_{\!M,j}) \end{align*}$$ where $f_{\!M,j}$ is the $j^{\;\!\text{th}}$ element of $F_M$. That is, $T_M$ is the cumulative discrepancy of the Farey sequence from uniform, over the intervals $(0,f_{\!M,j}]$, for each element $f_{\!M,j}$ of the sequence $F_M$; and $S_M$ is the corresponding cumulative discrepancy-squared. One might wonder what can be said about these cumulative values, to help bound the deviation of the Farey sequences from the uniform distribution.
-
From Dress' result above and the scaling of $N_M$, we know that $S_M \leqslant 3/\pi^2 + O(M^{-1} \log(M))$ and $T_M \leqslant 3M/\pi^2 + O(\log M)$. So we do have some bounds on these quantities.
-
However, Franel [3] proved that the infimum of $\theta$ for which $S_M = O(M^{-2+2\theta})$, is the supremum of the real parts of the zeroes of the Riemann zeta-function. Thus, the Riemann Hypothesis is equivalent to $\forall \epsilon > 0: S_M = O(M^{-1+\epsilon})$.
-
Franel's result was subsequently strengthened by Landau [4] who showed that the Riemann Hypothesis is equivalent to $\forall \epsilon > 0 : T_M = O(M^{1/2 + \epsilon})$.
That is to say: some questions of precisely how uniformly, the Farey sequence converges to the uniform distribution over intervals of constant size, run against well-known frontiers in mathematics.
Initial observations about fine-grained distribution
These observations regarding the uniformity of the distribution of the Farey sequence over intervals of constant width are all well and good. However, they largely correspond to integration of something like a density function over an interval, which may cover a multitude of sins. Furthermore, this asymptotic behaviour fails to capture interesting features of the distribution of Farey sequences on a finer-grained level, which is relevant for representing real numbers in data structures.
For $0 \leqslant j < N_M$, consider the sizes of the gap $g_{M,j} = f_{\!M,{j+1}} - f_{\!M,j}$ between Farey neighbours $f_{\!M,j}$ and $f_{\!M,{j+1}}\,$. Then we have $$\begin{align*} g_{M,0} &= 1/M \\ g_{M,1} &= 1/M(M-1) \end{align*}$$ which is a rather significant change in the spacing of Farey neighbours. Asymptotically, this jump would be invisible, occurring just about the origin as it does; this and other erratic behaviour at scales close to $1/M$ would simply average out.
In order to consider something like the Farey sequence in neighborhoods of arbitrary size about any real number, let $\mathcal F_M$ be the extended Farey set of all rational numbers $q$ (not just $0 \leqslant q \leqslant 1$) which may be expressed with a denominator at most $M$. Define $$ \rho_{M,\epsilon}(x) = \frac{1}{\epsilon N_M}\cdot \Bigl\lvert \mathcal F_M \cap \bigl[x\!-\!\tfrac{1}{2}\epsilon, x\!+\!\tfrac{1}{2}\epsilon\bigr] \Bigr\rvert$$ to be the "$\epsilon$-scale density" of the Farey numbers $\mathcal F_M$ about $x$. For any fixed $\epsilon > 0$, if we integrate $\rho_{M,\epsilon}$ over the unit interval, we should obtain precisely $1$; and the coarse-grained distribution results above imply that again for fixed $\epsilon > 0$ and $x \in \mathbb R$, $\rho_{M,\epsilon}(x) = 1 + O(M^{-1} \log(M))$. But if we let $w = 1/M$ to consider the "density" of Farey numbers in small neighborhoods about $x$, we may obtain precise (and quite distinct) values for the local density: $$\begin{alignat}{2} \rho_{M,w}(0) &= \qquad\frac{M \cdot 1}{N_M} &&= \frac{\pi^2}{3 M} + O(M^{-2}) ; \\[3ex] \rho_{M,w}\Bigl(\tfrac{3}{2M}\Bigr) &= \frac{M \cdot \bigl(\lceil M/2 \rceil + 1\bigr)}{N_M} &&= \frac{\pi^2}{6} + O(M^{-1}). \end{alignat}$$ That is to say: the density jumps from vanishing with $M$, to a value which differs slightly from $\pi^2/6 \approx 1.645$, and does so in a fairly small span. This structure will simply be invisible asymptotically for an interval of fixed size, and shows that inasmuch as we are interested in the behaviour of a density function as opposed to a mass function on intervals, the convergence of the distribution of the Farey sequence to a uniform distribution cannot, itself, be uniform.
In light of the fact that some of this structure will be invisible as $M \to \infty$ for intervals of fixed size, there is a question of how even to consider the asymptotic behaviour of the density in this fine-grained sense. It is possible that some of this structure could be fruitfully described only by making non-trivial choices of the parameterisation of the interval $[0,1]$. However, even without such devices, one might ask such questions as the measure of the set $\{ x \in [0,1] : \rho_{M,w}(x) > 1 + kw\}$ for fixed constants $k$, e.g., for $w=1/M$, as $M \to \infty$.
Summary
The asymptotic behaviour of the Farey sequence on any given non-empty interval in $[0,1]$ is "uniform", to first-order approximation, in the sense that 'integrating' it over any fixed non-empty interval (counting the elements and dividing by the interval width and $N_M = \lvert F_M \rvert$) converges to $1$ for any interval. Furthermore, the convergence is relatively quick, with some tight known bounds on convergence for intervals of the form $(0,x]$ for $0 \leqslant x \leqslant 1$. (Ask for more sophisticated bounds on convergence than this, and you may end up asking questions equivalent to the Riemann Hypothesis.)
However, this does not address more fine-grained notions of 'density' of the Farey sequence, for which elementary observations yield results which show that these well-known coarse-graining results arise from averaging out some quite significant and possibly interesting variations in small-scale behaviour (e.g., in neighbourhoods of size $\epsilon = 1/M$) for finite $M$.
References
[1] A. H. Ledoan. The discrepancy of Farey series. Acta Math. Hungar. 156 (465–480), 2018. https://doi.org/10.1007/s10474-018-0868-x
[2] F. Dress. Discrépance des suites de Farey. J. Théor. Nombres Bordeaux, 11 (345–367), 1999.
[3] J. Franel. Les suites de Farey et les problèmes des nombres premiers. Nachr. von der Ges. der Wiss. zu Göttingen, Math.-Phys. Kl., (198–201), 1924.
[4] E. Landau. Bemerkungen zu der vorstehenden Abhandlung von Herrn Franel. Nachr. von der Ges. der Wiss. zu Göttingen, Math.-Phys. Kl., (202–206), 1924.