Expected maximum pairwise distance for $n$ points on a circle?
Place $n$ points uniformly at random on a circle of circumference $1$. What is the expected maximum distance between any pair $x_i$, $x_j$ of those points?
I'm defining distance as distance on the circle, i.e., the length of the smallest path from point $X$ to point $Y$ which does not leave the circle.
Solution 1:
Please refresh the page at least once for the hyperlinks to work properly.
$\require{begingroup}\begingroup\renewcommand{\dd}[1]{\,\mathrm{d}#1}$This question is about geometric probability, order statistics, and circular data (all at the elementary level). The post has not received much attention, and in the absence of references (e.g. expository papers or textbook exercise) specifically serving this problem, allow me to share my two cents. Although the derivation is not quite complete, I believe what’s reported here are correct.
This answer post consists of three sections, first one displaying the final answer out front, then two showing the derivation (not rigorous but rather an outline of the main approach and some key aspects).
- The main results: exact formulae and numerics.
- Demonstration of the visual tool: circular diagrams.
- Encoding of the diagrams into strings for combinatorial analysis, then a rough sketch of the recurrence relation that justifies the general formula
Sec.1$\quad$ General Formula for the Distribution and Expectation
For $n$ physically distinct points on the circle, denote the max pairwise distance as $X_n$. The circle is parametrized to have unity circumference, and distance between any two points is defined as the smaller (geodesic) of the two arc lengths (thus $0 < X_n < \frac12$). The expectation is
$$E[X_n] = \frac12 - \sum_{k = 0}^{ \lfloor (n-1)/2 \rfloor } \frac1{2k+1} {n \choose 2k+1} \frac1n \frac1{2^n} = \sum_{k = 0}^{ \lfloor (n-1)/2 \rfloor } {n - 1 \choose 2k} \frac{ -1 + (2k+1)n }{2^n (2k+1)^2} \label{Eq_EX} \tag*{Eq.($\star$)}$$
Note the floor $\lfloor \frac{n-1}2 \rfloor$ in the summation limit.
Not much effort was put into finding a more compact form for $E[X_n]$ since the numerical evaluation of is easy. Here are the first few terms:
$$\scriptsize\begin{aligned} E[X_3] &= \frac{13}{36} \approx 0.36111 & E[X_4] &= \frac5{12} \approx 0.41666 & E[X_5] &= \frac{67}{150} \approx 0.44666 & E[X_6] &= \frac{ 167 }{ 360 } \approx 0.46388 \\ E[X_7] &= \frac{2789}{5880} \approx 0.47432 & E[X_8] &= \frac{ 101 }{210} \approx 0.48095 & E[X_9] &= \frac{1376}{2835} \approx 0.48536 & E[X_{10}] &= \frac{ 3077 }{ 6300 } \approx 0.48841 \end{aligned}$$
The expectation is obtained via routine integration of the distribution \ref{Eq_CDFn} or equivalently \ref{Eq_fn}, which derivation is the subject starting Sec.2.
Denote the cumulative CDF for $X_n$ as $F_n$ and the density as $~f_n$. It can be shown that
$$F_{n}(t) = \sum_{k = 0}^{ \lfloor \frac{n-1}2 \rfloor } {n \choose 2k+1} \bigl( (2k+1)t - k \bigr)^{n-1} \cdot \mathcal{I}_{ t \,> \frac{k}{2k+1} } \label{Eq_CDFn} \tag*{Eq.(1)}$$
where the $\mathcal{I}_{\text{blah}}$ is the indicator function. That is, the distribution is inherently piecewise where a "new piece" appears every two $n$ at critical points $\frac{k}{2k+1} = \frac13$, $\frac25$, $\frac37$, $\frac49$, $\frac5{11}$, etc.
Equivalently, the density of this distribution is
$$f_{n}(t) = n (n-1) \sum_{k = 0}^{ \lfloor \frac{n-1}2 \rfloor } {n - 1\choose 2k} \bigl( (2k+1)t - k \bigr)^{n-2} \cdot \mathcal{I}_{ t \,> \frac{k}{2k+1} } \label{Eq_fn} \tag*{Eq.(2)} $$
It is understood that the density is identically zero outside of $t \in [0, \frac12)$ and the corresponding indicator function will be omitted. The plots above show left the CDF and right the density: $n = 3$ in blue, $n = 4$ in orange, $n = 5$ in green, $n = 6$ in red, and so on. Since we are talking about pairwise distance (as opposed to interval lengths between adjacent points, which linear case is dealt with here), it is intuitively clear that the distribution quickly gets concentrated on the high end ($\text{distance} = \frac12$) as $n$ increases.
In particular, the exact functional form the first few $n=3$ to $8$ are:
$$\begin{align} f_3(t) &= 6 \left( t + (3t - 1) \mathcal{I}_{ t \, > \frac13 } \right) \label{Eq_f3} \tag*{Eq.(3)} \\ f_4(t) &= 12 \left( t^2 + 3(3t - 1)^2 \mathcal{I}_{ t \, > \frac13 } \right) \label{Eq_f4} \tag*{Eq.(4)} \\ f_5(t) &= 20 \left( t^3 + 6(3t - 1)^3 \mathcal{I}_{ t \, > \frac13 } + (5t - 2)^3 \mathcal{I}_{ t \, > \frac25 } \right) \label{Eq_f5} \tag*{Eq.(5)} \\ f_6(t) &= 30 \left( t^4 + 10(3t - 1)^4 \mathcal{I}_{ t \, > \frac13 } + 5(5t - 2)^4 \mathcal{I}_{ t \, > \frac25 } \right) \label{Eq_f6} \tag*{Eq.(6)} \\ f_7(t) &= 42 \left( t^5 + 15(3t - 1)^5 \mathcal{I}_{ t \, > \frac13 } + 15(5t - 2)^5 \mathcal{I}_{ t \, > \frac25 } + (7t - 3)^5 \mathcal{I}_{ t \, > \frac37 } \right) \label{Eq_f7} \tag*{Eq.(7)} \\ f_8(t) &= 56 \left( t^6 + 21(3t - 1)^6 \mathcal{I}_{ t \, > \frac13 } + 35(5t - 2)^6 \mathcal{I}_{ t \, > \frac25 } + 7(7t - 3)^6 \mathcal{I}_{ t \, > \frac37 } \right) \label{Eq_f8} \tag*{Eq.(8)} \\ \end{align}$$ If there is a shortcut to obtain the expectation without involving the distribution, I’m unaware of it. Directly setting up the integral like commented by @achille hui is a good approach readily generalized to higher $n$ without needing much analysis. However, the integral is hard to evaluate in exact (or numerically with Mathematica or Matlab, etc) for $n$ as low as $n = 5$ even after taking into account the ordering symmetries.
Sec.2$\quad$ Circular Diagram as a Visual Aid
What is the geometric and combinatorial origin of the $2t - 1$ increase per $k$ in the density?
How does the recurrence going from $n$ to $n+1$ actually work?
In order to better understand the whole situation and the solution, one needs to get some hands-on experience in addition to merely reading this exposition that is far from comprehensive.
How to construct/read a circular diagram:
-
The circle of unity circumference is parametrized as $\theta \in [\frac{-1}2, \frac12)$, with positive angle (signed arc length) going counter-clockwise.
-
The points $A$, $B$, $C$, $D, \ldots$, etc, have angles $a$, $b$, $c$, $d, \ldots$, etc
-
For a given max distance $t$, point $A$ has two auxiliary points: $A_t$ at angle $\theta = a + t$ and $A_{-t}$ at angle $\theta = a - t $ that span the admissible range all other points should reside within. Same for $B$ and its $B_t$ and $B_{-t}$ as well as $C$, $D$, etc.
-
One revolution is added/subtracted to the angles algebraically whenever necessary. For example, $B_t$ has angle $b + t~$ that sometimes is in fact $b + t - 1$ because $\theta \in [\frac{-1}2, \frac12)$.
-
In the diagrams the last point ($n$th) is never explicitly shown. For example, a diagram showing points $\{A,B,C\}$ is meant for $n = 4$ where point $D$ is implicit. For the following bullets, figures that come later might be more suitable than the two above.
-
Density $~f_n(t)$ corresponds to configurations when one pair of points attain the specified max distance $t$, while other "free points" run within their mutual admissible ranges.
-
Without loss of generality, one can always set $a = 0$ and $b = t < \frac12$. Anchoring the pair that attains the max distance as such has multiplicity of $n(n-1)$.
-
Following the max-setting pair ($A$,$B$), one can force a descending order the remaining points $t > c > d > e \ldots > -t$ . These free points has multiplicity $(n-2)!$ that makes $n!$ together with the anchoring pair.
-
In general $c,d,e,\ldots$ can be either positive or negative (while maintaining their order). The case where all are in the upper half $c > d > e \ldots > 0 = a$ shall be called the baseline piece.
Sec.2.1$\quad$ Density Derivation $n = 3$ and $n = 4$
This main answer post is too long (over the input limit of $30$k characters). Please see the separate post for detailed implementation of the circular diagram.
Sec.2.2$\quad$ Density Pieces Demo $n = 5,7$
This subsection will demonstrate only one piece of $n = 5$ and one piece from $n = 7$. Of particular interest is the "new piece" that appears after the critical point $\frac25$ for $n = 5$ and $\frac37$ for $n = 7$.
The new piece for $\mathbf{n = 5}$, with the encoding introduced later in Sec.3, is the configuration $\Gamma(1,\frac23,\frac22) = \{C, 0, D, C_X, E\}~$. This code means the points are in descending order as in $\{B,C, 0,B_t, D, C_{-t}, C_t,E,A_{-t}\}$, where $\text{zero point} = B_{-t} = A$.
See the left figure below, point $E$ is implicit. Due to $D$ being in front of $C_{-t}$ (and behind $B_t$) the upper limit of $C$ is reduced from $t$ by "one gap" of magnitude $1 - 2t$. Due to $E$ being behind $C_t$ (and in front of $A_{-t}$) the lower limit of $C$ is lifted from zero by one gap. $$\begin{aligned} &\phantom{ {}={} }\int_{c = 1 -2t}^{3t - 1} \int_{d = c - t}^{2t-1} \int_{e = -t}^{c+t - 1} 1 \dd{e}\dd{d}\dd{c} & &\boxed{\scriptsize\begin{aligned} c' &\equiv c - (1-2t) \\ d' &\equiv d - (c - t) = d - (c' + 1-3t) \\ e' &\equiv e + t \end{aligned} }\\ &= \int_{c' = 0}^{5t - 2} \int_{d = 0}^{5t-2 - c'} \int_{e' = 0}^{c'} 1 \dd{e}\dd{d}\dd{c} & &\scriptsize\begin{aligned} &\text{it's a tetrehedron that is rotated from} \\ &\text{the simplex}~0<e'<d'<c' \end{aligned} \\ &= \frac{ (5t-2)^3 }{3!} & &\text{, for}~ t > \frac25 \end{aligned}$$ After multiplied by $n! = 120$, this is the last term in the density \ref{Eq_f5}. The condition on $t$ comes from $2t -1 > d > c - t$ combined with $c > 1 - 2t$. The one-gap-reduction of upper limit of $C$ stems from the inadmissible region $\{B_t,B_{-t}\}$ stuck in the way. Meanwhile, the one-gap-lift of lower limit originates from $\{C_t, C_{-t}\}$ not being able to go more negative when blocked by $E$.
The new piece for $\mathbf{n = 7}$ has a configuration that is tentatively shown on the right diagram just above, where point $G$ is implicit. It can be coded as $\Gamma(1,\frac13,\frac23,\frac23) = \{C, D, 0, E, C_X, F, D_X\}~$, Literally this is reading the points in order as $\{B,C,D,0, B_t, E, C_{-t}, C_t, F, D_{-t}, D_t, A_{-t}\}$.
The point $\theta = 3t-1$ is labeled as $A_{3t}$ with a square (diamond) mark. Again, $3t - 1$ is the upper limit of $C$ due to $E$ blocking in front of $C_{-t}~$. The lower limit of $C$ is lifted by "two gaps" $2(1-2t)$ because $F$ is inserted between $C_t$ and $D_{-t}$ preventing them to overlap and limiting the range, whereas point $G$ forbids $D_t$ from going more negative.
The limits of the remaining points should not cause much trouble. $$\begin{aligned} &\phantom{ {}={} }\int\limits_{c = 2 -4t}^{3t - 1} \int_{d = 1 - 2t}^{c+2t-1} \int\limits_{e = c-t}^{2t - 1} \int_{f = d - t}^{c+t-1} \int\limits_{g = -t}^{d+t - 1} 1 \dd{g}\dd{f}\dd{e}\dd{d}\dd{c} & &\boxed{\scriptsize\begin{aligned} c' &\equiv c - (2-4t) \\ d' &\equiv d - (1 - 2t) \\ e' &\equiv e -(c- t) = e + 5t - 2 - c'\\ f' &\equiv f - (d - t) = f + 3t - 1 - d' \\ g' &\equiv g + t \end{aligned} }\\ &= \int\limits_{c' = 0}^{7t - 3} \int_{d' = 0}^{c'} \int\limits_{e' = 0}^{7t - 3 - c'} \int_{f' = 0}^{c' - d'} \int\limits_{g' = 0}^{d'} 1 \dd{g}\dd{f}\dd{e}\dd{d}\dd{c} & &\scriptsize\begin{aligned} &\text{Some kind of linear transform from} \\ &\text{simplex}~0<g' < f' < e'<d'<c' \end{aligned} \\ &= \frac{ (7t-3)^5 }{5!} & &\text{, for}~ t > \frac37 \end{aligned}$$ With the factor $n! = 5040$, this is the last term in the density \ref{Eq_f7}. The critical point for $t$ can be found by merging $2t−1>d>c−t$ with $c > 2−4t$.
The two exemplary pieces above demonstrate that the recipe described in the opening (the items before Sec.2.1) is THE natural geometric/combinatorial decomposition of configurations. Namely, seeing $n$ points distributed on the circle in various ways, one must ask: Can one organize/categorize the configurations? Can one break down $E[X_n]$ into parts that can be recognized? Is there a way to reveal the inner structure for any $n$, as well as showing the relation between $n+1$ and $n$?
The answer is, yes, such view exists: anchoring two points that attains max distance $t$, then allocate the remaining points to the disconnected admissible regions (as if they were bins) in descending order. This natural decomposition gives exactly the $2^{n-2}$ pieces in the density, where the pieces are all in the form of simplex volumes.
Sec.3$\quad$ Recurrence Relation via Combinatorial Encoding of the Circular Diagram
One can extract the relevant geometrical and combinatorial information contained in the circular diagrams via encoding them as (linear) strings.
- Listing the points (not the angles) in a descending order
- Ignore the leading $B$ (also $A_t$) and trailing $A_{-t}$ that are always there.
- Denote the pairs that mark the inadmissible range like $\{C_{-t},~C_t\}$ and $\{D_{-t},~D_t\}$ etc as the "gap-points" (or just gaps) $C_X$ and $D_X$, etc.
- Use the symbol $0$ to emphasize that zero point is special. (note: $0$ also stands for $B_X$).
The first two subsections focus on the recursive algorithm (rules) that generate the "codes" as the number of points increases. How the codes indicate the calculation of the pieces for the density will be addressed in Sec.3.3.
Further specifications to the codes is adopted in anticipation of later results. The symbol Gamma will be used (for C as in configuration) $\Gamma(j,k,\ell,\ldots)$ to provide indexing for each code (string).
For three points, the $2^{n-2} = \text{two}$ pieces of \ref{Eq_f3} (derived in Sec.2.1) is now encoded as: $$\begin{aligned} \text{baseline}: & & \Gamma(1) &\equiv \{ C,~0,~C_X \} \\ \text{all-negative}:& &\Gamma(-1) &\equiv \{0,~ C \} \end{aligned}$$
Sec.3.1$\quad$ Encoding going from $n = 3$ to $n = 5$
Here's the formal statement the rules (algorithm) for generating the codes (configurations) for $n+1$ given those of $n$. Each rule is considered proven, as they are obviously true by demonstration.
- The new point (the $(n+1)$th point) can go only after the $n$th point, by construct.
- There's no variation in the "all-negative" piece. Just attach the new point at the end.
- The new point can go between $0$ and any of the "gaps" symbolized by $C_X,D_X,E_X,\ldots$ etc that are explicitly present in the code (not out of bounds).
- The inadmissible region $P_X$ for any point $P$ is attached to the end of the code if and only if $P$ is in front of zero. In other words, $P_{t}$ and $P_t$ are out of bounds ("more negative" than $\theta = -t$) if and only if $P$ comes after zero.
Going to $\mathbf{n = 4}$ by inserting $D$. We get three pieces of $n = 4$ from the baseline piece of $n = 3$, and there's only one descendant from the all-negative piece:
$$\begin{array}{c|cccccc|c} \mathbf{ \Gamma(1) } & C & & 0 & & C_X & & \\ \hline \Gamma(1, \frac13) & C & D & 0 & & C_X & D_X & \scriptsize \text{new baseline for}~ n=4 \\ \Gamma(1, \frac23) & C & & 0 & D & C_X & & \\ \Gamma(1, \frac33) & C & & 0 & & C_X & D & \\ \end{array} \\ \begin{array}{c|ccc|c} \mathbf{ \Gamma(-1) } & 0 & C & & \\ \hline \Gamma(-1, -1) & 0 & C & D & \scriptsize \text{new all-negative for}~ n=4 \end{array}$$ Reminder: the gaps $C_X$ and $D_X$ are out of bounds (more negative than $A_{-t}$ at $\theta = -t$ ). It is indeed true that these inadmissible range like $D_X$ can be in the upper half ($0 < \theta < t < \frac12$). No worries, the ranges for points (integration limits) will work out "automatically", necessarily leading to a sequence of decreasing spaces that yields the simplex volume.
#####Going to $\mathbf{n = 5}$, we have the $2^{n-2} = \text{four}$ configurations from $n = 4$ that makes up \ref{Eq_f4} encoded as $\Gamma(1,\frac13), \Gamma(1,\frac23), \Gamma(1,\frac33)$, and $\Gamma(-1,-1)$ just above.
Let’s start with the baseline, inserting the fifth point $E$ yields four new configurations. $$\begin{array}{c|ccccccccc|c} \mathbf{ \Gamma(1, \frac13) }& C & D & & 0 & & C_X & & D_X & & \\ \hline \Gamma(1, \frac13, \frac14) & C & D & E & 0 & & C_X & & D_X & E_X & \scriptsize \text{new baseline for}~ n=5 \\ \Gamma(1, \frac13, \frac24) & C & D & & 0 & E & C_X & & D_X & & \\ \Gamma(1, \frac13, \frac34) & C & D & & 0 & & C_X & E & D_X & & \\ \Gamma(1, \frac13, \frac44) & C & D & & 0 & & C_X & & D_X & E & \end{array}$$
The remaining two of the "bloodline of $n = 4$ baseline" $\Gamma(1, \cdot)$ produce two and one respectively: $$\begin{array}{c|cccccc} \mathbf{ \Gamma(1, \frac23) } & C & 0 & D & & C_X & \\ \hline \Gamma(1, \frac23, \frac12) & C & 0 & D & E & C_X & \\ \Gamma(1, \frac23, \frac22) & C & 0 & D & & C_X & E \\ \end{array} \quad,\quad \begin{array}{c|ccccc} \mathbf{ \Gamma(1, \frac33) } & C & 0 & C_X & D & \\ \hline \Gamma(1, \frac33, 1) & C & 0 & C_X & D & E \end{array}$$
Of course, the all-negative for $n = 4$ necessarily produces only the new all-negative. $$\begin{array}{c|cccc|c} \mathbf{ \Gamma(-1, -1) } & 0 & C & D & & \\ \hline \Gamma(-1, -1, -1) & 0 & C & D & E & \scriptsize \text{all-negative for}~ n=5 \end{array}$$
These are the $(4+2+1)+1 = 2^{n-2} = \text{eight}$ configurations that have a one-to-one mapping with the constituent "pieces" of $~f_5(t)$, the \ref{Eq_f5} in Sec.1.
Sec.3.2$\quad$ Going to $n=6$ and Comments on $\Gamma$ Indexing and Lengths
This main answer post is too long (over the input limit of $30$k characters). Please see the separate post for additional discussion on the encoding system.
Sec.3.3$\quad$ Density Pieces from Encoded Configurations: the Simplex Volumes
The previous subsections established the natural codes (listing of configurations). Here we state the rules/algorithm to read the contribution to the density from a given code.
Denote the range of point $C$ (integration limit) as $\mathcal{R}_c = \mathcal{U}_c - \mathcal{L}_c$ the upper limit minus the lower limit, then:
- All configurations contribute to the density in the form of $\frac1{(n-2)!} \mathcal{R}_c^{n-2}~$, which is the volume of a simplex with side length $\mathcal{R}_c = t - (k_U+k_L)(1 - 2t)$, with $k_U$ and $k_L$ described below.
- The upper limit is always either $t$ or $3t-1$. Namely, $\mathcal{U}_c = t - k_U(1 - 2t)$ where $k_U = 0$ or $1$ is the number of group of points behind zero and in front of $C_X$.
- The lower limit is always in the form of $\mathcal{L}_c = k_L(2t-1)$ where $k_L$ counts the number of groups of points behind $C_X$. The gaps $D_X$ and $E_X$ act as dividers (bin edge), and points between the same pair of dividers are considered a single group.
The rules to count $k_U$ and $k_L$, are obviously true as demonstrated previously in Sec.2.2: the "one-gap-reduction" and "one-gap-lift" correspond to the two separate origins of the gap (same magnitude $1-2t$).
For example, at $n = 9$ with point $A$ to point $I$
- The configuration $\{C, D, 0, E, F, C_X, G, D_X, H, I \}$ which is indexed as $\Gamma(1,\frac13,\frac24,\frac13,\frac23,\frac23,1)$ contributes $\frac{(7t-3)^7}{7!}$ with an overall $9!$ in front, because there is one group $\{E, F\}$ in front of $C_X$ and two groups $\{G\}$ and $\{H, I\}$ behind $C_X$.
- The configuration $\{C, D, E, 0, C_X, F, G, D_X, E_X, H, I\}$ which is indexed as $\Gamma(1,\frac13,\frac14,\frac35,\frac13,\frac33,1)$ contributes $9!\,\frac{(5t-2)^7}{7!}$, because there is zero group in front of $C_X$ and two groups $\{F,G\}$ and $\{H, I\}$ behind $C_X$. Note that $E_X$ immediately follows $D_X$ and has no effect.
- The new and "last" piece is the configuration $\{C, D, E, 0, F, C_X, G, D_X, H, E_X, I \}$ which index is $\Gamma(1,\frac13,\frac14,\frac25,\frac24,\frac23,\frac22)$. The density from this config is $72(9t-4)^7$, because there is one group $\{F\}$ in front and three groups $\{F\}$, $\{G\}$, and $\{I\}$ after $C_X$.
Here's another quick look at where the circular diagram is clearly helpless. At $n = 16$ with points $A$ to $P$, the configuration $$\begin{aligned} &\Gamma(1,\frac13,\frac14,\frac15,\frac26,\frac15,\frac14,\frac13,\frac12,\frac12,\frac12,\frac22,1,1) \\ &= \{C, D, E, F, 0, G, H, C_X, I, D_X, J, E_X, K, L, M, F_X, N, O, P\}\end{aligned}$$ corresponds to a piece of $16!\,\frac{(11t-5)^{14}}{14!}$ where the total of $k_U + k_L = 1+4=\text{five}$ groups are $\{G,H\}$, $\{I\}$, $\{J\}$, $\{K,L,M\}$, and $\{N,O,P\}$. For the record, we know that for $n = 16$ the highest order is $(15t-7)^{14}$, see \ref{Eq_fn}. This is achieved by configurations such as $$\begin{aligned} & & &\{C, \ldots, H, 0, I, C_X, J, D_X, K, E_X, L, F_X, M, G_X, N, H_X, O, P\} \\ &\text{and} & &\{C, \ldots, I, 0, J, C_X, K, D_X, L, E_X, M, F_X, N, G_X, O, H_X, P, I_X\}~,\end{aligned}$$ among the ${n-1 \choose k}=15$ where $k \lfloor \frac{n-1}2\rfloor = 7$.
The proof about the simplex volume can perhaps be done with induction, while intuitively it's at least reasonable in that one can see the points having hierarchically restricted space.
Concluding Remarks
Many details of different levels can be formalized into rigorous proofs, but to be honest the effort I've made so far is not enough to overcome the majority of the technicalities.
Several less trivial claims are stated as "obviously true by demonstration", in the sense that: the process of using the circular diagram and encoding the configurations has been enough to convince the author that the results are true. The geometric and combinatorial nature of the problem is manifested by the construct.
Roughly speaking, the "solution" goes like
- Establishing the encoding as an exact representation of the problem.
- Showing via the codes that there are $2^{n-2}$ disjoint configurations.
- Showing via the codes that each configuration contributes to a simplex volume.
- Showing that this holds for all $n \geq 3$.
Arguably, not displaying a proof for (2) here in the post isn't a big deal. As for (4) the recurrence, it's a judgement call whether one can take it as true, perhaps having some faith in the induction proof (which formulation is hinted by the demonstration).
The most important task to continue is to fix the lack of semi-rigorous derivation for (3): the simplex volume, either for a given $n$, or with induction from trivial cases $n = 2$ and $3$.$\endgroup$
Solution 2:
Please refresh the page at least once for the hyperlinks to work properly.
$\require{begingroup}\begingroup\renewcommand{\dd}[1]{\,\mathrm{d}#1}$Some contents are post separately here because the main answer post is too long (over the input limit of $30$k characters).
Below are Sec.2.1 followed by Sec.3.2, where the latter details $n = 6$ for the encoding system with along with some general extra info.
Sec.2.1$\quad$ Density Derivation $n = 3$ and $n = 4$
For $t < \frac13$ in figure on the left below, point $C$ is implicit with $0 < c < b = t~$. The density is $$f_3(t) = \int_{c = 0}^t 1 \cdot \dd{t} = t \qquad \text{, for}~~ t < \frac13$$ where the integrand $1$ is the uniform density along the circle which circumference is unity.
For $t > \frac13$, figure on the right, in addition to the admissible range on the positive (upper) half, $C$ can be negative and between $A_{-t}$ and $B_{t}$ such that $-t < c < b + t - 1 = 2t - 1$. $$f_3(t) = \underbrace{ \int_{c = 0}^t 1 \dd{t} }_{ \text{same form as case $t \,< \frac13$} } + \int_{c = -t}^{2t-1} 1 \cdot \dd{t} = t + (3t - 1) \qquad \text{, for}~~ t > \frac13$$
This "opening" in the negative half exists only when $B_t$ is in the "front" of $A_{-t}$, as in $b + t - 1 > a-t.~$ With $a = 0$ and $b = t$, this gives the critical point $t > \frac13$.
Together with the multiplicity $n(n-1)$, this is Eq.(3) listed in Sec.1 $$ f_3(t) = n! \bigl( \text{baseline} + \text{all-negative} \bigr) = 6 \left( t + (3t - 1) \mathbb{1}_{ t \, > \frac13 } \right) $$
The three diagrams below display $\mathbf{n = 4}$. Point $D$ is implicit and that it can reside only in the 3-arc-intersection of green-purple-yellow.
The left diagram plays a dual role. Firstly, it shows $0 < d < c < b = t$, the baseline piece. $$\text{For all $0<t<\frac12$,} \quad \int_{c = 0}^{t} \int_{d = 0}^{c} 1 \dd{d} \dd{c} = \frac{t^2}2~~, \qquad \scriptsize\text{(sorry for the awkward notation)} $$ This form is in fact $\frac{ t^{n-2}}{ (n-2)! }$, the volume of a $(n-2)$-simplex with side length $t$.
This left diagram also serves for the configuration of an "extra" piece, where $D$ is in the lower half, between $C_{-t}$ and $B_t~$, such that $c - t < d < b + t - 1$.
This happens only when $B_t$ is in front of $C_{-t}$, which is in turn in front of $A_{-t}~$, as in $a-t < c - t < b + t - 1 < 0~$. These imply $c < 3t - 1$ and $t > \frac13$. $$\textbf{extra}_1: \quad \int_{c = 0}^{3t - 1} \int_{d = c-t}^{2t-1} 1 \dd{d} \dd{c} = \int_{c = 0}^{3t - 1} (3t - 1 - c) \dd{c} = \frac{ (3t - 1)^2}2 \qquad \text{, for}~~t > \frac13$$ It’s not a coincidence that this is the volume of a $(n-2)$-simplex with side length $3t - 1$.
The middle diagram: point $D$ is below ($d < 0$) while $C$ is above ($c > 0$). Another "new opening" appears in the 3rd quadrant when $C_t$ is in front of $A_{-t}~$. Namely, $c + t - 1 > a - t = -t$ so that $c > 1- 2t > 0$ .
This extra admissible range is for $D$ to reside between $A_{-t}$ and $C_t$ as in $-t < d < c+t-1$.
$$\textbf{extra}_2: \quad \int \limits_{c = 1- 2t}^{t} \int_{d = -t}^{c + t - 1} 1 \dd{d} \dd{c} = \int \limits_{c = 1 - 2t}^{t} \bigl(c - (1- 2t) \bigr) \dd{c} = \frac{ (3t - 1)^2}2$$
The condition on $t$ is by construct $b > c \implies t > c > 1 - 2t \implies t > \frac13$.
The natural and geometric approach is to keep track of the "opening" magnitude: do change of variable $c’ \equiv c - (1-2t)$, yielding again the volume of a simplex with side length $3t-1$ as in $\int_{c’ = 0}^{3t-1} c’ \dd{c'} = \frac{ (3t - 1)^2}2$. This is not an isolated case but generally what goes on with the various configurations for any $n$.
The last of the $2^{n-2} = 4$ pieces is shown in the right diagram, where both points are below zero. Namely, all-negative for the $n-2$ free points. Necessarily they are also behind $B_t$ so that $b + t - 1 > c > d$, or equivalently $0 > 2t - 1 > c > d > -t$.
$$\begin{aligned} \textbf{all-negative}:& & &\phantom{{}=} \int_{c = -t}^{2t - 1} \int_{d = -t}^c 1 \dd{d} \dd{c} & &\boxed{\begin{aligned} &\scriptsize\text{natural change of variables:} \\ &c’ \equiv c + t~,~~ d’ \equiv d + t \end{aligned}} \\ &&&= \int_{c’ = 0}^{3t - 1} \int_{d’ = 0}^{c’} 1 \dd{d} \dd{c} & &\\ &&&= \frac{ (3t - 1)^2}2 && ,~t > \frac13 \end{aligned}$$
Putting things together, one recovers Eq.(4) seen in Sec.1 : $$\begin{aligned} f_4(t) &= n! \Bigl( \text{baseline} + \text{all-negative} + \text{extra}_1 + \text{extra}_2 \Bigr) \\ &= 24 \Bigl( \frac{t^2}2 + \mathbb{1}_{t\, > \frac13} \bigl( \frac{ (3t - 1)^2 }2 + \frac{ (3t - 1)^2 }2 + \frac{ (3t - 1)^2 }2 \bigr) \Bigr) \end{aligned}$$
Sec.3.2$\quad$ Going to $n=6$ and Comments on $\Gamma$ Indexing and Lengths
Pardon me for not reproducing a table listing all the eight configs of $n = 5$ to save space.
Again, with the baseline of $n = 5$, one can see why it yields five configs for $n = 6$. $$\begin{array}{c|cccccccccccc|c} \mathbf{ \Gamma(1, \frac13, \frac14) } & C & D & E & & 0 & & C_X & & D_X & & E_X & & \\ \hline \Gamma(1, \frac13, \frac14, \frac15) & C & D & E & F & 0 & & C_X & & D_X & & E_X & F_X & \scriptsize \text{baseline}~n=6 \\ \Gamma(1, \frac13, \frac14, \frac25) & C & D & E & & 0 & F & C_X & & D_X & & E_X & & \\ \Gamma(1, \frac13, \frac14, \frac35) & C & D & E & & 0 & & C_X & F & D_X & & E_X & & \\ \Gamma(1, \frac13, \frac14, \frac45) & C & D & E & & 0 & & C_X & & D_X & F & E_X & & \\ \Gamma(1, \frac13, \frac14, \frac55) & C & D & E & & 0 & & C_X & & D_X & & E_X & F & \end{array}$$
The astute readers might already know that the remaining "base-bloodline" configs $\Gamma(1, \frac13,\cdot)$ will produce three, two, and one, respectively. $$\begin{array}{c|ccccccccc} \mathbf{\Gamma(1, \frac13, \frac24) } & C & D & 0 & E & & C_X & & D_X & \\ \hline \Gamma(1, \frac13, \frac24, \frac13) & C & D & 0 & E & F & C_X & & D_X & \\ \Gamma(1, \frac13, \frac24, \frac23) & C & D & 0 & E & & C_X & F & D_X & \\ \Gamma(1, \frac13, \frac24, \frac33) & C & D & 0 & E & & C_X & & D_X & F \end{array} \\ \scriptsize\begin{array}{c|ccccccc} \mathbf{\Gamma(1, \frac13, \frac34) } & C & D & 0 & C_X & E & & D_X & \\ \hline \Gamma(1, \frac13, \frac34, \frac12) & C & D & 0 & C_X & E & F & D_X & \\ \Gamma(1, \frac13, \frac34, \frac22) & C & D & 0 & C_X & E & & D_X & F \end{array} \quad \begin{array}{c|ccccccc} \mathbf{\Gamma(1, \frac13, \frac44) } & C & D & 0 & C_X & D_X & E & \\ \hline \Gamma(1, \frac13, \frac44, \frac11) & C & D & 0 & C_X & D_X & E & F \end{array}$$
Continue the descending pattern, the "secondary base-bloodline" $\Gamma(1, \frac23,\cdot)$ render two and one. Then there's the "single bloodline". $$\begin{array}{c|ccccccc} \mathbf{\Gamma(1, \frac23, \frac12) } & C & 0 & D & E & & C_X & \\ \hline \Gamma(1, \frac23, \frac12, \frac12) & C & 0 & D & E & F & C_X & \\ \Gamma(1, \frac23, \frac12, \frac22) & C & 0 & D & E & & C_X & F \end{array} \quad \begin{array}{c|cccccc} \mathbf{\Gamma(1, \frac23, \frac22) } & C & 0 & D & C_X & E & \\ \hline \Gamma(1, \frac13, \frac34, \frac22, 1) & C & 0 & D & C_X & E & F \end{array} \\ \begin{array}{c|cccccc} \mathbf{\Gamma(1, \frac33, 1) } & C & 0 & C_X & D & E & \\ \hline \Gamma(1, \frac13, \frac33, 1, 1) & C & 0 & C_X & D & E & F \end{array}$$ Lastly the all-negative: $$\begin{array}{c|ccccc|c} \mathbf{\Gamma(-1, -1, -1) } & 0 & C & D & E & & \\ \hline \Gamma(-1, -1, -1, -1) & 0 & C & D & E & F & \scriptsize \text{all-negative for}~ n = 6 \end{array}$$ These are the $(5 + 3 + 2 + 1) + (2+1) + 1 + 1 = 2^{n-2} = 16$ pieces mapping to $~f_6(t)$, the Eq.(6) in Sec.1.
Some comments on the $\Gamma$ indexing:
- Baseline pieces have ascending consecutive denominators and $1$ in the numerator: $\Gamma(1, \frac13), \Gamma(1, \frac13, \frac14), \Gamma(1, \frac13, \frac14, \frac15),\ldots$
- An argument of $\Gamma$ being $1$ means the bloodline becomes single line from there. The arguments in fractions like $\frac55, \frac44$ and $\frac22$ equal $1$ not by accident.
- The all-negative might as well all be denoted as just $\Gamma(-1)$
- There are $n - 2$ indices in $\Gamma(j,k,\ell,\ldots)$. That is, the number of arguments in $\Gamma$ indicates the number of points.
According to the rules for generating the codes (algorithm), it's easy to see the recurrence algorithm for the length (number of pieces) in each "bloodline":
- Each number gives a descending sequence of that length (e.g. whenever you see a $3$ it creates a $[3,2,1]~$).
- The very first (longest) sequence has the leading term bumped up by one (due to the special role of $0 = B_X$ in the code). For the first few $n$:
$$ [3,1] \\ [4, 2, 1], [1] \\ [5,3,2, \color{magenta}1], [2, \color{green}1], [1], [1] \\ [6,4,\underline{\mathbf{3}},2,\color{red}1], [3, 2, \color{blue}1], [2,1], \color{magenta}{[1]}, [2,\color{orange}1],\color{green}{[1]},[1],[1] \\ \scriptsize[7,5,4,3,2,1], [4, 3, 2, 1], \underline{\mathbf{[3,2,1]}}, [2,1], \color{red}{[1]},[3,2,1],[2,1], \color{blue}{[1]} ,[2,1],[1],\color{magenta}{[1]},[2,1], \color{orange}{[1]},\color{green}{ [1]},[1],[1] $$ It's easy to prove this two-rule algorithm for the length at any $n$ leads to a total of $2^{n-2}$.
By design, given a legit $\Gamma$ that correctly corresponds to a code, one can reconstruct the code by reading the indices. For example, given $\Gamma(1,\frac13,\frac24,\frac23)$ one can deduce the code to be $\{C,D,0,E,C_X,F,D_X\}$ in the following manner.
- First index $1$ means $C$ is in front of zero, $\implies \{C,0,C_X\}$
- Second index $\frac13$ means $D$ goes in the first of the 3 spaces, $\{C, \_\_,0, \_\_,C_X,\_\_ \} \implies \{C,D,0,C_X,D_X \}~$, where $D_X$ is in the code (inside the bound, $d + t - 1 > -t~$) because $D$ is in front of zero.
- Third index $\frac24$ means $E$ goes to the 2nd of the 4 spaces that follow $D$, $\{C, D,\_\_,0, \_\_,C_X,\_\_,D_X,\_\_ \} \implies \{C,D,0,E,C_X,D_X \}~$, where $E_X$ is out of bounds because $E$ is behind $0$.
- Fourth index $\frac23$ means $F$ falls in the 2nd of the 3 spaces that follow $E$, $\{C, D,0, E,\_\_,C_X,\_\_,D_X,\_\_ \} \implies \{C,D,0,E,C_X,F,D_X \}~$, where again $F_X$ is not inside due to $f<0~$.
Conversely, one can deduce the $\Gamma$ indexing directly by reading the code, without needing to go through the hierarchical listing of all the levels below.$\endgroup$