Expected time to convergence

Consider the following process: we place $n$ points labelled $1...n$ uniformly at random on the interval $[0,1]$. At each time step, two points $i, j$ are selected uniformly at random and $i$ updates its position to be a point chosen uniformly at random in the interval between the positions of $i$ and $j$ (so the interval $[p(i),p(j)]$ if $p(i) < p(j)$ or $[p(j),p(i)]$ otherwise, where $p(x)$ denotes the position of the point labelled $x$).

  1. What is the expected time until all points are within distance $\varepsilon$ of each other for some fixed $\varepsilon > 0$?

  2. What is the expected time until all points are either to the left or right of $\frac{1}{2}$?

Asymptotic bounds are also very interesting to me.


Update: I'll prove that the expected time in question 1 (for fixed $\varepsilon$) is $\Theta(n \log n)$, and that the expected time in question 2 is $\geq \Omega(n \log n)$; at the moment I don't have a good upper bound for question 2.

Upper bound:

Define $S = \sum_{1 \leq a, b \leq n} (p(a) - p(b))^2$, and let $S_t$ be the value of $S$ at time $t$. Now, at a fixed time $t$, let $p$ denote the positions at time $t$, and $p'$ denote the positions at time $t+1$. Then $p'$ is determined from $p$ by independently picking $(i, j)$ uniformly from $\{1, \dots, n\}^2$ and $\lambda \sim \mathrm{Unif}[0, 1]$, and setting $p'(a) = p(a)$ for all $a \neq i$, and $p'(i) = \lambda p(i) + (1-\lambda) p(j)$. Then we have \begin{align*} S_t - S_{t+1} &= 2 \sum_{a \neq i} \left((p(i) - p(a))^2 - (p'(i) - p(a))^2\right) \\ &= 2(1-\lambda^2)(p(i) - p(j))^2 + 2 \sum_{a \neq i, j} \left((p(i) - p(a))^2 - (p'(i) - p(a))^2\right) \\ &= 2(1-\lambda^2)(p(i) - p(j))^2 + 2 \sum_{a \neq i, j} (p(i) - p'(i))(p(i) + p'(i) - 2p(a)) \\ &= 2\left(1-\lambda^2 + (n-2)\lambda(1-\lambda)\right)(p(i) - p(j))^2 \\ &\qquad+ 2 (1-\lambda) \sum_{a \neq i, j} (p(i) - p(j))(p(i) + p(j) - 2p(a)) \end{align*} There is a slight technicality here in that it appears we have assumed $i \neq j$, but in fact these expressions are still equal in the case $i = j$: the LHS is $0$ since then $S_{t+1} = S_t$, and the RHS is $0$ since $p(i) = p(j)$. Now, upon taking expectations conditional on $S_t$, the second term vanishes because it's antisymmetric in $i, j$ (i.e. it is negated when we swap $i, j$), so we have \begin{align*} S_t - \mathbb{E}[S_{t+1} | S_t] &= \mathbb{E}\bigg[ 2\left(1-\lambda^2 + (n-2)\lambda(1-\lambda)\right)(p(i) - p(j))^2 \,\bigg|\, S_t \bigg] \\ &= 2\,\mathbb{E}\big[1-\lambda^2 + (n-2)\lambda(1-\lambda)\,\big|\, S_t\big] \mathbb{E}[(p(i) - p(j))^2 \,|\, S_t] \\ &= 2\left(\frac{n+2}{6}\right)\left(\frac{S_t}{n^2}\right) \end{align*} (where the second equality follows by independence), hence $$\mathbb{E}[S_{t+1}|S_t] = \left(1 - \frac{n+2}{3n^2}\right)S_t$$ which means, in particular, that $\mathbb{E}[S_{t+1}] = \left(1 - \frac{n+2}{3n^2}\right)\mathbb{E}[S_t]$.

We can now use this to get an upper bound for question 1. Note we have $\mathbb{E}[S_0] \leq n^2$ and $\mathbb{E}[S_{t+1}] \leq e^{-1/3n} \mathbb{E}[S_t]$, so $\mathbb{E}[S_t] \leq n^2 e^{-t/3n}$, and thus the probability that there are two points which are at least $\varepsilon$ apart at time $t$ is at most $(n^2 / \varepsilon^2) e^{-t/3n}$ (by Markov's inequality, since if this holds we must have $S_t \geq \varepsilon^2$). Letting $T_1$ be the time at which all points are within distance $\varepsilon$ of each other, this means $\mathbb{P}[T_1 > t] \leq (n^2 / \varepsilon^2) e^{-t/3n}$, hence $$\mathbb{E}[T_1] = \sum_{t=0}^\infty \mathbb{P}[T_1 > t] \leq \sum_{t=0}^\infty \min \{(n^2 / \varepsilon^2) e^{-t/3n}, 1\},$$ which we can approximate as $$6n \log(n / \varepsilon) + \int_{6n\log(n/\varepsilon)}^\infty (n^2 / \varepsilon^2) e^{-t/3n} \,dt = 6n\log(n / \varepsilon) + 3n.$$ Therefore $\mathbb{E}[T_1] \leq O(n \log(n/\varepsilon))$.

Lower bounds:

I should have thought of this before -- we can actually get some basic coupon-collector-type lower bounds for both questions. I'm not really optimizing for good constants below.

Lemma: Suppose we have two disjoint sets $A, B \subset \{1, \dots, n\}$ with $|A|, |B| \geq k$. At each time step we choose a uniformly random element of $\{1, \dots, n\}$. Then the expected time until either all elements of $A$ have been chosen at least once or all elements of $B$ have been chosen at least once is at least $(n/2) \log k$.

Proof: This is essentially the same as the proof for the coupon collector's problem found here. Let $a_1, \dots, a_k$ be distinct elements of $A$, and $b_1, \dots, b_k$ distinct elements of $B$, and say that the pair $(a_i, b_i)$ is hit if one of $a_i, b_i$ is chosen. Note that our condition -- that all elements of $A$ are chosen or all elements of $B$ are chosen -- is satisfied only if all pairs $(a_1, b_1), \dots, (a_k, b_k)$ are hit. Let $t_r$ be the time until the $r$-th pair is hit after $r-1$ pairs are hit. After $r-1$ pairs are hit, the probability of hitting a new pair is $\frac{2(k-r+1)}{n}$, hence $t_r$ has geometric distribution with expectation $\frac{n}{2(k-r+1)}$, and the expected time until all pairs are hit is thus at least $$\sum_{r=1}^k \mathbb{E}[t_r] = \frac{n}{2} \sum_{r=1}^k \frac{1}{k-r+1} = \frac{n}{2} \sum_{r=1}^k \frac{1}{r} \geq \frac{n}{2} \log k.$$

Note that for question 1, at time $T_1$ all points lie in some interval $I$ of length $\leq \varepsilon$, so all points initially outside of $I$ have moved, i.e. all points initially outside of $I$ have been chosen as the $i$-value at some time-step. Defining $I^- = [0, (1+\varepsilon)/2]$ and $I^+ = [(1-\varepsilon)/2, 1]$, this interval necessarily satisfies either $I \subset I^-$ or $I \subset I^+$. Letting $A$ be the set of points outside of $I^-$ at $t = 0$, and $B$ be the set of points outside of $I^+$ at $t = 0$, this means that at time $T_1$, either every element of $A$ has been chosen as $i$ or every element of $B$ has been chosen as $i$. The complement of each of $I^-$ and $I^+$ is an interval of length $\frac{1 - \varepsilon}{2}$, hence since the points are i.i.d. uniform on $[0, 1]$ at $t = 0$, we have $|A|, |B| \geq \frac{1 - \varepsilon}{3} n$ with probability $1 - o(1)$. Conditioning on the arrangement of points at $t = 0$, by the Lemma the conditional expectation has $\mathbb{E}[T_1 \mid p|_{t = 0}] \geq \frac{1}{2} n \log (\frac{1-\varepsilon}{3}n)$ when $|A|, |B| \geq \frac{1 - \varepsilon}{3} n$, hence the unconditional expectation has $\mathbb{E}[T_1] \geq \frac{1}{2} (1 - o(1)) n \log(\frac{1-\varepsilon}{3}n) = \Omega(n \log n)$.

Similarly, for question 2, let $A$ be the set of points in $[0, 1/2)$ at $t = 0$, and $B$ be the set of points in $(1/2, 1]$ at $t = 0$. Then at time $T_2$ (when all points lie on one side of $1/2$), either all points in $A$ have moved, or all points in $B$ have moved. Since with probability $1 - o(1)$ we have $|A|, |B| \geq n/3$, the same argument as above gives $\mathbb{E}[T_2] \geq \frac{1}{2} (1 - o(1)) n \log(n/3) = \Omega(n \log n)$.

Computational results:

I don't have an upper bound for question 2 right now, but I ran some simulations of the process $50$ times for each of $n = 10, 11, \dots, 250$. The estimates for $\mathbb{E}[T_2]/n$ are plotted below.

question 2

Based on this, $\mathbb{E}[T_2]/n$ appears to be linear in $\log n$, suggesting the possibility of a matching upper bound for $\mathbb{E}[T_2]$. Least squares gives a best-fit line of $\mathbb{E}[T_2]/n \approx 5.75 \log n - 7.19$.


This isn't an answer but I did some numerical simulations in order to at least check out what would happen.

Actual Data

For these, I ran $1000$ different random sequences for $n=2,3,\cdots 30$ and $\epsilon=1/2^k$ for $k=1,2,\cdots 6$. From this, I would conjecture that a terrible upper bound on this sequence is given by $(kn)^2$, where $k$ is related to $\epsilon$ by

$$k=\frac{\log \left(\frac{1}{\epsilon}\right)}{\log (2)}$$

Upper Bound

while an asymptotic bound might be given by $(kn)^{5/4}$

Asymptotic Bound

Note that in both of these graphs the black graph is the actual data while the red graph is the conjectured bound.

For your second question, I proceeded in much that same manner except I made $2000$ random sequences instead of $1000$. However, the results were much the same as before

Bounds

where the black plot is the actual data while the red plot is $n^2$. Again, it would seem a crude upper bound is $n^2$.


Exact solution for $n=2$. Consider the process when the points are at $(a, b)$ with $0 < a < b < 1$ and a target of $\epsilon$. It is equivalent to a process with points $(0, 1)$ and a target of $\frac{\epsilon}{a-b}$. With this observation, let $f(t)$ be the expected number of steps when points are at $(0, 1)$ and the target is $\epsilon$. The next step is terminates the process with probability $\epsilon$. If it doesn't terminate, the remaining interval has a width $U(\epsilon, 1)$. So $f$ must satisfy the following recurrence

$$f(t) = 1 + \int_t^1 f(t/x) \mathrm{d}x$$ which is satisfied by $f(t) = 1 - \ln(t).$ The initial points are both $U(0, 1)$, so the expected number of steps with $n=2$ is given by

$$\int_0^1 (2- 2w)\left(1 - \ln\left(\frac{\epsilon}{d}\right)\right) \mathrm{d}w \\ = \frac{\epsilon^2-1}{2} + \log{\left(\frac{1}{\epsilon} \right)}$$ Which matches experiments. case n=2

For larger $n$ this gives the "trivial" lower bound of $O(-n \log{\epsilon})$ since for larger $n$, the smalest and largest points are close to $(0, 1)$ and they are moves on average every $2/n$ steps.