"Distribution" of numbers $0\leq a\leq b\leq c\leq d\leq 1$

A friend came up with the next problem: Consider $0\leq a\leq b\leq c\leq d\leq 1$ numbers such that $a+b+c+d=1$. Are there numbers $a_{0}$, $b_{0}$, $c_{0}$, $d_{0}$ that minimize $$|a-a_{0}|+|b-b_{0}|+|c-c_{0}|+|d-d_{0}|$$ most of the time? I mean, if we repeat the process of choosing $a$, $b$, $c$ and $d$ randomly, is there a expected value for $a$, $b$, $c$ and $d$? And what it is?

I tried to start with an easiest problem, with just $a$ and $b$, such that $0\leq a\leq b\leq 1$ and $a+b=1$ but I had no idea where to start, so I decided to try with some numerical sampling, and I did the next in Python:

Choose a number $x\in [0,1]$ uniformely, and define $a=\min\{x,1-x\}$ and $b=\max\{x,1-x\}$. Clearly $0\leq a\leq b\leq 1$ and $a+b=1$. Doing this, and repeating a lot of times I got that the "expected value" for $a$ was $0.25$ and for $b$ was $0.75$, so the ratio is $1:3$.

Next, I tried the same in Python but with two values: Choose $x,y\in [0,1]$ uniformely, and let $x_{1}=\min\{x,y\}$ and $x_{2}=\max\{x,y\}$, now define $$A=\{x_{1},x_{2}-x_{1},1-x_{2}\}$$ and let $a_{1}$, $a_{2}$, $a_{3}$ be the elements of $A$ in increasing order. Clearly $0\leq a_{1}\leq a_{2}\leq a_{3}\leq 1$, and $a_{1}+a_{2}+a_{3}=1$, and repeting a lot of times I got that the "expected values" were of the ratio $2:5:11$.

Repeating the same method but now with one more variable I got that the "expected values" were on ratio $3:7:13:25$.

Lastly, with one more variable the ratios were $12:27:47:77:137$.

All this calculations were found just by trial and error, and are clearly non mathematically justified, but seems like, at least, a good conjecture.

Is there any hidden pattern behind these ratios? Is there any reason for this numbers to came up?

Any help would be appreciated.


Solution 1:

Without the sorting, if you just choose $n$ random numbers $x_i$ such that $0 \leq x_i \leq 1$ and $\sum x_i = 1,$ the sample space can be represented as an $(n-1)$-dimensional polytope (specifically, a simplex) with vertices at $(1,0,0,\ldots,0)$, $(0,1,0,\ldots,0),$ $(0,0,1,0,\ldots,0), \ldots,$ $(0,0,\ldots,0,1).$ (That is, at each vertex exactly one coordinate is $1$ and the others are $0$.) An obvious choice of distribution over this region would be a uniform density.

I believe your unsorted differences achieve this distribution. Certainly they give the correct expected value for each variable.

Now you apply the restriction that $x_i \leq x_j$ if $i < j.$ The sample space for these variables is a smaller $(n-1)$-dimensional simplex that you can obtain by "cutting away" the parts of the original simplex in which $x_1 > x_2,$ or $x_2 > x_3,$ and so forth until you cut away the part in which $x_n > x_{n-1}.$

Generating the unsorted differences and then sorting them converts a uniform distribution over the larger simplex to a uniform distribution over the smaller one.

As an example, for $n = 3$ the final sample space has vertices at $(0,0,1),$ $\left(0,\frac12,\frac12\right),$ and $\left(\frac13,\frac13,\frac13\right).$ The centroid of this simplex is $\left(\frac19,\frac{5}{18},\frac{11}{18}\right),$ with coordinates in the ratio $2:5:11.$

For $n=4$ the vertices are $(0,0,0,1),$ $\left(0,0,\frac12,\frac12\right),$ $\left(0,\frac13,\frac13,\frac13\right),$ and $\left(\frac14,\frac14,\frac14,\frac14\right).$ For $n=5$ the vertices are $(0,0,0,0,1),$ $\left(0,0,0,\frac12,\frac12\right),$ $\left(0,0,\frac13,\frac13,\frac13\right),$ $\left(0,\frac14,\frac14,\frac14,\frac14\right),$ and $\left(\frac15,\frac15,\frac15,\frac15,\frac15\right),$ and so forth for larger $n.$

In short, each vertex is obtained by maximizing the value of $x_i$ for a different $i.$ It should be clear that for $n$ variables, the $i$th coordinate of the centroid of the final simplex, $i = 1, 2, \ldots, n,$ is $$ \frac1n\left(\sum_{j=1}^{n} \frac1j - \sum_{j=1}^{n-i} \frac1j\right). $$


Since summing up terms of the harmonic series can be a bit painful, here's a simpler way to generate the ratios. Denote the difference between the expected values of $x_{i+1}$ and $x_i$ by $\Delta_i = E(x_{i+1}) - E(x_i).$ Notice that if we take the expected value of $x_1,$ and follow it by the increasing sequence of these differences, we get a sequence of numbers in the ratio $$ E(x_i) : \Delta_1 : \Delta_2 : \cdots : \Delta_{n-3} : \Delta_{n-2} : \Delta_{n-1} = \frac1n : \frac{1}{n-1} :\frac{1}{n-2} : \cdots: \frac13 : \frac12 : 1. \tag{*} $$

In order to make the ratio on the right-hand side a ratio of integers, we merely need multiply each part by the least common multiple of $\{1,2,3,\ldots, n\}.$

For example, for $n = 5,$ we find that the least common multiple of $\{1,2,3,4,5\}$ is $60.$ The ratio $(^*)$ is therefore $$ \frac15 : \frac14 :\frac13 :\frac12 : 1 = \frac{60}{5} : \frac{60}{4} :\frac{60}{3} :\frac{60}{2} : 60 = 12 : 15 : 20 : 30 : 60. $$

Now, recalling that except for the first part of this ratio, each part is proportional to the difference between two consecutive expected values, and observing that $E(x_k) = E(x_1) + \sum_{i=1}^{k-1} \Delta_i,$ we replace the ratio with a ratio of partial sums: $$ 12 : 15 : 20 : 30 : 60 \to 12 : 27 : 47 : 77 : 137, $$ and there's the ratio you found.

Solution 2:

It seems that your process is:

  • choose $n-1$ values independently and uniformly on $[0,1]$
  • sort these values into order
  • find the gaps between successive sorted values, and between $0$ and the smallest, and between the largest and $1$, so you calculate $n$ non-negative gaps adding up to $1$
  • sort these gaps into order
  • consider the expected values of these sorted gaps

For what it is worth, I believe the $n$ gaps (before sorting) have identical but not independent distributions with density $f(x)=n(1-x)^{n-1}$ when $0 \le x \le 1$ and so mean $\frac{1}{n}$

You asked for an explanation of the ratios pattern you observed. Empirically it seems that the expected length of the $k$th sorted gap out of $n$ is related to the difference of two harmonic numbers $$\frac{1}{n}\left(\sum_{i=1}^{n} \frac1{i} - \sum_{j=1}^{n-k} \frac1{j} \right) \\ = \sum_{j=n-k+1}^{n} \frac1{nj} $$

So in your initial example with $n=4$, you get for different values of $k$:

  1. : $\frac1{4\times4} = \frac1{16} = \frac{3}{48}$
  2. : $\frac1{4\times3}+\frac1{4\times4} = \frac{7}{48}$
  3. : $\frac1{4\times2}+\frac1{4\times3}+\frac1{4\times4} = \frac{13}{48}$
  4. : $\frac1{4\times1}+\frac1{4\times2}+\frac1{4\times3}+\frac1{4\times4} = \frac{25}{48}$

reproducing your $3:7:13:25$ ratios. This happens with other values for $n$ too

I believe that these expected values do not minimise $\mathbb E [ |a-a_{0}|+|b-b_{0}|+|c-c_{0}|+|d-d_{0}| ]$ but they do minimise $\mathbb E[(a-a_{0})^2+(b-b_{0})^2+(c-c_{0})^2+(d-d_{0})^2]$

To minimise the expected absolute sums I think you would do better with the medians of the distributions of the ordered gaps, and I suspect these medians may not add up to $1$ when $n \gt 2$. So for example with $n=4$ the means are about $0.0625, 0.1458, 0.2708, 0.5208$ but the medians seem experimentally to be closer to something like $0.052, 0.145, 0.276, 0.500$