Expected value of maximum consecutive distance in a uniformly random permutation

Solution 1:

The answer to your first question is $n - \sqrt{\pi n}/2 + o(\sqrt{n})$. The proof is quite involved. So I will only sketch the main idea. Let me know if you are interested in more details.

The answer to your second question is: $(\alpha t)^n \leq c_t^n \leq (\beta t)^n$ for some constants $\alpha, \beta > 0$.

Part I. An estimate for $\mathbb{E}[\max_{1\le i < n} |\sigma(i) - \sigma(i+1)|]$.

It will be convenient to work with the random variable $\Delta = n - \max_{1\le i < n} |\sigma(i) - \sigma(i+1)|$. Our goal is to prove that $\mathbb{E}[\Delta] = (\sqrt{\pi}/2 + o(1)) \sqrt{n}$. We will consider an algorithm that generates $\sigma$ uniformly at random and show that for this algorithm $\mathbb{E}[\Delta] = (\sqrt{\pi}/2 + o(1)) \sqrt{n}$ (of course, the answer doesn't depend on the way we generate $\sigma$, as long as $\sigma$ is distributed uniformly).

Denote $[n] = \{1,\dots,n\}$.

Let $k = n ^{1/2+ \varepsilon}$ (for some small $\varepsilon > 0$). Choose two disjoint subsets $S,T \subset [n]$ uniformly at random.

Now

assign values from $\{1,\dots, k\}$ to $\sigma(i)$ for $i\in S$ (use each value once);
assign values from $\{n-k+1,\dots, n\}$ to $\sigma(i)$ for $i\in T$ (use each value once);
assign values from $\{k+1,\dots, n-k\}$ to $\sigma(i)$ for $i\in [n]\setminus (S\cup T)$ (use each value once).

We get a permutation $\sigma$ distributed uniformly at random.

Informal Observation. Observe that if $i\in S$ and $i+1\in T$ then $\sigma(i+1) - \sigma(i) \geq (n-k+1) - k = n - 2k +1$ and thus $\Delta \leq 2k - 1$. Similarly, if $i\in T$ and $i+1\in S$, then $\Delta \leq 2k - 1$. On the other hand, if there is no $i$ such that either $i\in S$ and $i+1\in T$ or $i\in T$ and $i+1\in S$ then for every $i$, $|\sigma(i) - \sigma(i+1)| \leq n - (k+1)$. So $\Delta \geq k + 1$.

Note that by the Birthday Paradox a pair $(i,i+1)$ with $i\in S$ and $i+1\in T$ should exist when $k\approx \sqrt{n}$. Therefore, we expect that $\Delta = \Theta(\sqrt n)$.

Let $A$ be the set of pairs $(i,i+1)$ such that $i\in S$ and $i+1\in T$ or $i\in T$ and $i+1\in S$. We estimate the size of $A$. The probability that $i\in S$ and $i+1\in T$ is $\frac{|S| \cdot |T|}{n(n-1)} = \frac{k^2}{n(n-1)}$; the probability that $i\in T$ and $i+1\in S$ is also $\frac{k^2}{n(n-1)}$. Thus the expected number of pairs is $2k^2/n$. Moreover, the number of such pairs is concentrated near $2k^2/n$.

It can happen that $(i,i+1)\in A$ and $(i+1,i+2)\in A$. But the expected number of such $i$ is approximately $k^3 / n^2$. So by the Markov inequality there is at least one such $i$ with probability at most $k^3/n^2 = n^{3\varepsilon - 1/2}$. We can ignore this event.

Let \begin{align} S' &= \{i\in S: (i,i+1) \in A \text{ or } (i-1,i)\in A\},\\ T' &= \{i\in T: (i,i+1) \in A \text{ or } (i-1,i)\in A\}. \end{align} Recall that we assigned values from $\{1,\dots,k\}$ to $\sigma(i)$ for $i\in S$. Let us a bit modify the assignment procedure:

For each $i\in S'$ assign $\sigma(i)$ a value from $\{1,\dots, k\}$ independently uniformly at random.
If we used each value from $\{1,\dots, k\}$ at most once, assign unused values from $\{1,\dots, k\}$ randomly to unassigned $\sigma(i)$, $i\in S\setminus S'$ using each value only once.
If we used some value more than once, assign values from $\{1,\dots, k\}$ to all $\sigma(i)$, $i\in S$ using each value only once (reassign some $\sigma(i)$).

Note that the expected number of pairs $(i,j) \in S'\times S'$ such that we assigned $\sigma(i) = \sigma(j)$ at the first step is approximately $|S'|^2 / k = 4k^3/n^2 = 4n^{ - 1/2 + 3\varepsilon}$. The probability that there is at least one such pair is at most $4n^{ - 1/2 + 3\varepsilon}$. We can ignore this event. So we can assume that the assignment algorithm always performs steps 1 and 2 (and never performs step 3). We similarly modify the procedure for assigning values to $\sigma(i)$ for $i\in T'$.

Then for every pair $(i,i+1) \in A$, $\sigma(i)$ is distributed uniformly at random in $\{1,\dots, k\}$ and $\sigma(i+1)$ is distributed uniformly at random in $\{n-k+1,\dots, n\}$ or the other way around. Values of $\sigma(i), \sigma(i+1)$ for different pairs $(i,i+1) \in A$ are independent.

We have that for every $(i,i+1) \in A$, the random variable $n+1 - |\sigma(i) - \sigma(i+1)|$ is distributed as a sum of two random variables uniformly distributed in $\{1,\dots,k\}$. Our goal now is to estimate the minimum of $2k^2/n$ independent copies of such random variable.

Consider two i.i.d. random variables $x \in_U (0,1)$ and $y \in_U (0,1)$. Then $\lceil k x\rceil$ and $\lceil k y\rceil$ are uniformly distributed in $\{1,\dots,k\}$. So $n+1 - |\sigma(i) - \sigma(i+1)|$ is distributed as $\lceil k x\rceil + \lceil k y\rceil \approx k(x+y)$ (up to an additive constant 2).

Lemma. Let $x_1,\dots,x_N$ and $y_1,\dots,y_N$ be independent random variables uniformly distributed on $(0,1)$. Let $z_i = x_i + y_i$. Then $$\mathbb{E}[\min_i\{z_i\}] \approx \sqrt{\frac{\pi}{2N}}.$$

Proof: The distribution function $F(t)$ (c.d.f.) of $z_i$ is given by $$F(t) = \begin{cases} 0, & \text{ if } t \leq 0;\\ t^2/2, & \text{ if } 0\leq t \leq 1;\\ 2t-1 - t^2/2, & \text{ if } 1\leq t \leq 2;\\ 1, &\text{ if } t > 1. \end{cases} $$

Note that if $z_i < z_j$ then $F(z_i) < F(z_j)$. Thus $\arg\min_i F(z_i) = \arg\min_i z_i$. Each $F(z_i)$ is uniformly distributed on $[0,1]$. Thus $\min_i\{ F(z_i)\}$ has probability density $N\cdot (1-s)^{N-1}$. We have $$\mathbb{E}[\min_i\{z_i\}]=\int_0^1 N\cdot s^{N-1} F^{-1}(s) ds \approx \int_0^1 N\cdot s^{N-1}\sqrt{2s} ds = \sqrt{2}\cdot N \cdot B(N,3/2)\approx \sqrt{2}\cdot\Gamma(3/2) \cdot N \cdot N^{-3/2}=\sqrt{\pi/(2N)},$$ where $B$ is the beta function and $\Gamma$ is the gamma function.

QED

Thus the minimum of $2k^2/n$ independent copies of $x+y$ approximately equals $\sqrt{\pi n/(4k^2)}$ in expectation. The minimum of $2k^2/n$ independent copies of $k(x+y)$ approximately equals $\sqrt{\pi n}/2$ in expectation.

We get that the minimum over all pairs $(i,i+1)\in A$ of $n+1 - |\sigma(i) - \sigma(i+1)|$ is approximately $\sqrt{\pi n}/2$. If $(i,i+1)\notin A$ then $n+1 - |\sigma(i) - \sigma(i+1)| \geq k +1 \gg \sqrt{n}$. We showed that $\Delta$ is approximately equal to $\sqrt{\pi n}/2$ in expectation.

Part II. Simulation.

Our estimate agrees with data from a computational simulation, which I did in MATLAB. For $n$ between $100$ and $10000$, $$\sqrt{\pi n}/2 -0.75 < \Delta_{\text{simulation}}(n) < \sqrt{\pi n}/2 + 0.1,$$ where $\Delta_{\text{simulation}}(n)$ is an estimate for $\Delta$ obtained in the simulation. Simulation results

Part III. An estimate for $c_t^n$.

We prove first that $c_t^n \leq n (2t)^{n-1}$. There are $n$ ways to choose $\sigma(1)$. Given $\sigma(1)$, there are at most $2t$ ways to choose $\sigma(2)$: $$\sigma(2)\in \{\sigma(1) -2t,\dots,\sigma(1)-1,\sigma(1)+1,\dots,\sigma(1)+t\}.$$ Similarly, there are at most $2t$ ways to choose each $\sigma(i)$ for $i\in\{2,\dots, n\}$. So the total number of permutations $\sigma$ (for a given value of $t$) is at most $n (2t)^{n-1}<(\beta t)^n$ (for some $\beta>0$).

Now we prove that there are at least $(t!)^{\lfloor n/(t+1)\rfloor} > (\alpha t)^n$ permutations (for some $\alpha > 0$). Consider the set of permutations of the form $$[1, 2, \dots t], (t+1), [(t+1) + 1, (t+1) + 2 \dots (t + 1) + t], 2(t+1), [2(t+1) + 1 \dots 2(t+1) +t], 3(t+1) \dots$$ where we can arbitrarily permute numbers inside each bracket. Note that $|\sigma(i) -\sigma(i+1)| \leq t$ for all these permutations.

Let's count the number of such permutations. There are $t!$ ways to arrange numbers inside each bracket. There are ${\lfloor n/(t+1)\rfloor} $ brackets. So the total number of permutations is (at least) $(t!)^{\lfloor n/(t+1)\rfloor}$. By Stirling's formula, it is at least $(\alpha t)^n$ for some $\alpha > 0$.

Solution 2:

A naive approximation for the second question. The restriction of having a permutation of length $n$ with consecutive distances not greater than $t$ prohibits a total number of pairs given by $ 2 \times (1 + 2 + \cdots n - t-1) = (n-t)(n-t-1)$

The total number of a priori pairs is $n(n-1)$. The probability that a given pair violates the restriction is then $$\frac{(n-t)(n-t-1)}{n (n-1)}$$

If we assume (very rough approximation!) that the probability that all the pairs violate/fit the restriction are independent, the we have a probability of "success" given by

$$p_{n,t}=\left(1-\frac{(n-t-1)(n-t)}{n (n-1)}\right)^{n-1} = \left( \frac{t \, (2 n - t -1)}{n \, (n-1)} \right)^{n-1}$$

If we further assume that $n \gg 1$:

$$p_{n,t} \approx \left(\frac{2 t }{n}\right)^{n-1} e^{-(t-1)/2} $$

The number of allowed permutations then can be approximated as

$$c_{n,t} = n! \; p_{n,t} \approx \left(\frac{2 t }{e}\right)^{n-1} e^{-(t-1)/2} \sqrt{2 \pi \, n}$$

I've made a few simulations and the approximation does not seem too bad.

Expected value of maximum consecutive distance in a uniformly random permutation

Solution 1:

Solution 2:

Related

Recent Posts