Convergence of $np(n)$ where $p(n)=\sum_{j=\lceil n/2\rceil}^{n-1} {p(j)\over j}$

Update: the following probabilistic argument I had posted earlier shows only that $p(1) + p(2) + \dots + p(n) = (c + o(1)) \log(n)$ and not, as originally claimed, the convergence $np(n) \to c$. Until a complete proof is available [edit: George has provided one in another answer] it is not clear whether $np(n)$ converges or has some oscillation, and at the moment there is evidence in both directions. Log-periodic or other slow oscillation is known to occur in some problems where the recursion accesses many previous terms. Actually, everything I can calculate about $np(n)$ is consistent with, and in some ways suggestive of, log-periodic fluctuations, with convergence being the special case where the bounds could somehow be strengthened and the fluctuation scale thus squeezed down to zero.


$p(n) \sim c/n$ is [edit: only in average] equivalent to $p(1) + p(2) + \dots + p(n)$ being asymptotic to $c \log(n)$. The sum up to $p(n)$ is the expected time the walk spends in the interval [1,n]. For this quantity there is a simple probabilistic argument that explains (and can rigorously demonstrate) the asymptotics.

This Markov chain is a discrete approximation to a log-normal random walk. If $X$ is the position of the particle, $\log X$ behaves like a simple random walk with steps $\mu \pm \sigma$ where $\mu = 2 \log 2 - 1 = 1/c$ and $\sigma^2 = (1- \mu)^2/2$. This is true because the Markov chain is bounded between two easily analyzed random walks with continuous steps.

(Let X be the position of the particle and $n$ the number of steps; the walk starts at X=1, n=1.)

Lower bound walk $L$: at each step, multiply X by a uniform random number in [1,2] and replace n by (n+1). $\log L$ increases, on average, by $\int_1^2 log(t) dt = 2 \log(2) - 1$ at each step.

Upper bound walk $U$: at each step, jump from X to uniform random number in [X+1,2X+1] and replace n by (n+1).

$L$ and $U$ have means and variances that are the same within $O(1/n)$, where the $O()$ constants can be made explicit. Steps of $L$ are i.i.d and steps of $U$ are independent, asymptotically identical-distributed. Thus, the Central Limit theorem shows that $\log X$ after $n$ steps is approximately a Gaussian with mean $n\mu + O(\log n)$ and variance $n\sigma^2 + O(\log n)$.

The number of steps for the particle to escape the interval $[1,t]$ is therefore $({\log t})/\mu$ with fluctuations of size $A \sqrt{\log t}$ having probability that decays rapidly in A (bounded by $|A|^p \exp(-qA^2)$ for suitable constants). Thus, the sum p(1) + p(2) + ... p(n) is asymptotically equivalent to $(\log n)/(2\log (2)-1)$.

Maybe this is equivalent to the 2003 argument from the conference. If the goal is to get a proof from the generating function, it suggests that dividing by $(1-x)$ may be useful for smoothing the p(n)'s.


After getting some insight by looking at some numerical calculations to see what $np(n)-c$ looks like for large $n$, I can now describe the convergence in some detail. First, the results of the simulations strongly suggested the following.

  • for $n$ odd, $np(n)$ converges to $c$ from below at rate $O(1/n)$.
  • for $n$ a multiple of 4, $np(n)$ converges to $c$ from above at rate $O(1/n^2)$.
  • for $n$ a multiple of 2, but not 4, $np(n)$ converges to $c$ from below at rate $O(1/n^2)$.

This suggests the following asymptotic form for $p(n)$. $$ p(n)=\frac{c}{n}\left(1+\frac{a_1(n)}{n}+\frac{a_2(n)}{n^2}\right)+o\left(\frac{1}{n^3}\right) $$ where $a_1(n)$ has period 2, with $a_1(0)=0$, $a_1(1) < 0$ and $a_2(n)$ has period 4 with $a_2(0) > 0$ and $a_2(2) < 0$. In fact, we can expand $p(n)$ to arbitrary order [Edit: Actually, not quite true. See below] $$ \begin{array} {}p(n)=c\sum_{r=0}^s\frac{a_r(n)}{n^{r+1}}+O(n^{-s-2})&&(1) \end{array} $$ and the term $a_r(n)$ is periodic in $n$, with period $2^r$. Here, I have normalized $a_0(n)$ to be equal to 1.

We can compute all of the coefficients in (1). As $p$ satisfies the recurrence relation $$ \begin{array} \displaystyle p(n+1)=(1+1/n)p(n)-1_{\lbrace n\textrm{ even}\rbrace}\frac2np(n/2) -1_{\lbrace n=1\rbrace}.&&(2) \end{array} $$ we can simply plug (1) into this, expand out the terms $(n+1)^{-r}=\sum_s\binom{-r}{s}n^{-r-s}$ on the left hand side, and compare coefficients of $1/n$. $$ \begin{array} \displaystyle a_r(k+1)-a_r(k)=a_{r-1}(k)-\sum_{s=0}^{r-1}\binom{-s-1}{r-s}a_{s}(k+1)-1_{\lbrace k\textrm{ even}\rbrace}2^{r+1}a_{r-1}(k/2).&&(3) \end{array} $$ Letting $\bar a_r$ be the average of $a_r(k)$ as $k$ varies, we can average (3) over $k$ to get a recurrence relation for $\bar a_r$. Alternatively, the function $f(n)=\sum_r\bar a_rn^{-r-1}$ must satisfy $f(n+1)=(1+1/n)f(n)-f(n/2)/n$ which is solved by $f(n)=1/(n+1)=\sum_{r\ge0}(-1)^rn^{-r-1}$, so we get $\bar a_r=(-1)^r$. Then, (3) can be applied iteratively to obtain $a_r(k+1)-a_r(k)$ in terms of $a_s(k)$ for $s < r$. Together with $\bar a_r$, this gives $a_r(k)$ and it can be seen that the period of $a_r(k)$ doubles at each step. Doing this gives $a_r\equiv(a_r(0),\ldots,a_r(2^r-1))$ as follows $$ \begin{align} & a_0=(1),\\\\ & a_1=(0,-2),\\\\ & a_2=(7,-3,-9,13)/2 \end{align} $$ These agree with the numerical simulation mentioned above.


Update: I tried another numerical simulation to check these asymptotics, by successively subtracting out the leading order terms. You can see it converges beautifully to the levels $a_0$, $a_1$, $a_2$ but, then...

convergence to asymptotics

... it seems that after the $a_2n^{-3}$ part, there is an oscillating term! I wasn't expecting that, but it there is an asymptotic of the form $cn^{-r}\sin(a\log n+\theta)$, where you can solve to leading order to obtain $r\approx3.54536$, $a\approx10.7539$.


Update 2: I was re-thinking this question a few days ago, and it suddenly occured how you can not only prove it using analytic methods, but give a full asymptotic expansion to arbitrary order. The idea involves some very cool maths! (if I may say so myself). Apologies that this answer has grown and grown, but I think it's worth it. It is a very interesting question and I've certainly learned a lot by thinking about it. The idea is that, instead of using a power series generating function as in Qiaochu's answer, you use a Dirichlet series which can be inverted with Perron's formula. First, the expansion is as follows, $$ \begin{array}{ccc} \displaystyle p(n)=\sum_{\Re[r]+k\le \alpha}a_{r,k}(n)n^{-r-k}+o(n^{-\alpha}),&&(4) \end{array} $$ for any real $\alpha$. The sum is over nonnegative integers $k$ and complex numbers $r$ with real part at least 1 and satisfying $r+1=2^r$ (the leading term being $r=1$), and $a_{r,k}(n)$ is a periodic function of $n$, with period $2^k$. The reason for such exponents is that the difference equation (2) has the continuous-time limit $p^\prime(x)=p(x)/x-p(x/2)/x$ which has solutions $p(x)=x^{-r}$ for precisely such exponents. Splitting into real and imaginary parts $r=u+iv$, all solutions to $r+1=2^r$ lie on the line $(1+u)^2+v^2=4^u$ and, other than the leading term $u=1$, $v=0$, there is precisely one complex solution with imaginary part $2n\pi\le v\log2\le2n\pi+\pi/2$ (positive integer $n$) and, together with the complex conjugates, this lists all possible exponents $r$. Then, $a_{r,k}(n)$ is determined (as a multiple of $a_{r,0}$) for $k > 0$ by substituting this expansion back into the difference equation as I did above. I arrived at this expansion after running the simulations plotted above (and T..'s mention of complex exponents of n helped). Then, the Dirichlet series idea nailed it.

Define the Dirichlet series with coefficients $p(n)/n$ $$ L(s)=\sum_{n=1}^\infty p(n)n^{-1-s}, $$ which converges absolutely for the real part of $s$ large enough (greater than 0, since $p$ is bounded by 1). It can be seen that $L(s)-1$ is of order $2^{-1-s}$ in the limit as the real part of $s$ goes to infinity. Multiplying (2) by $n^{-s}$, summing and expanding $n^{-s}$ in terms of $(n+1)^{-s}$ on the LHS gives the functional equation $$ \begin{array} \displaystyle (s-1+2^{-s})L(s)=s+\sum_{k=1}^\infty(-1)^k\binom{-s}{k+1}(L(s+k)-1).&&(5) \end{array} $$ This extends $L$ to a meromorphic function on the whole complex plane with simple poles precisely at the points $-r$ with real part at least one and $r+1 = 2^r$, and then at $-r-k$ for nonnegative integers $k$. The pole with largest real component is at $s = -1$ and has residue $$ {\rm Res}(L,-1)={\rm Res}(s/(s-1+2^{-s}),-1)=\frac{1}{2\log2-1}. $$ If we define $p^\prime(n)$ by the truncated expansion (4) for some suitably large $\alpha$, then it will satisfy the recurrence relation (2) up to an error term of size $O(n^{-\alpha-1})$ and its Dirichlet series will satisfy the functional equation (5) up to an error term which will be an analytic function over $\Re[s] > -\alpha$ (being a Dirichlet series with coefficients $o(n^{-\alpha-1}))$. It follows that $p^\prime$ has simple poles in the same places as $p$ on the domain $\Re[s] > -\alpha$ and, by choosing $a_{r,0}$ appropriately, it will have the same residues. Then, the Dirichlet series associated with $q(n)= p^\prime(n)-p(n)$ will be analytic on $\Re[s] > -\alpha$ We can use Perron's formula to show that $q(n)$ is of size $O(n^\beta)$ for any $\beta > -\alpha$ and, by making $\alpha$ as large as we like, this will prove the asymptotic expansion (4). Differentiated, Perron's formula gives $$ dQ(x)/dx = \frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}x^sL(s)\,ds $$ where $Q(x)=\sum_{n\le x}q(n)$ and $\beta > -\alpha$. This expression is intended to be taken in the sense of distributions (i.e., multiply both sides by a smooth function with compact support in $(0,\infty)$ and integrate). If $\theta$ is a smooth function of compact support in $(0,\infty)$ then $$ \begin{array} \displaystyle\sum_{n=1}^\infty q(n)\theta(n/x)/x&\displaystyle=\int_0^\infty\theta(y)\dot Q(xy)\,dy\\\\ &\displaystyle=\frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}x^sL(s)\int\theta(y)y^s\,dy\,ds=O(x^\beta)\ \ (6) \end{array} $$ We obtain the final bound, because by integration by parts, the integral of $\theta(y)y^s$ can be shown to go to zero faster than any power of $1/s$, so the integrand is indeed integrable and the $x^\beta$ term can be pulled outside. This is enough to show that $q(n)$ is itself of $O(n^\beta)$. Trying to finish this answer off without too much further detail, the argument is as follows. If $q(n)n^{-\beta}$ was unbounded then would keep exceeding its previous maximum and, by the recurrence relation (2), it would take time at least Ω(n) to get back close to zero. So, if $\theta$ has support in $[1,1+\epsilon]$ for small enough $\epsilon$, the integral $\int\theta(y)\dot Q(ny)\,dy$ will be of order $q(n)$ at such times and, as this happens infinitely often, it would contradict (6). Phew! I knew that this could be done, but it took some work! Possibly not as simple or direct as you were asking for, but Dirichlet series are quite standard (more commonly in analytic number theory, in my experience). However, maybe not really more difficult than the probabilistic method and you do get a whole lot more. This approach should also work for other types of recurrence relations and differential equations too.

Finally, I added a much more detailed writeup on my blog, fleshing out some of the details which I skimmed over here. See Asymptotic Expansions of a Recurrence Relation.


Here is the standard generating function gruntwork. Let $Q(x) = \sum_{n \ge 1} \frac{p(n)}{n} x^n$. Then

$$Q'(x) = \sum_{n \ge 1} p(n) x^{n-1} = \sum_{n \ge 1}^{\infty} x^{n-1} \sum_{j = \lceil \frac{n}{2} \rceil}^{n-1} \frac{p(j)}{j}.$$

Exchanging the order of summation gives

$$Q'(x) = \sum_{j=1}^{\infty} \frac{p(j)}{j} \sum_{n=j+1}^{2j} x^{n-1} = \sum_{j=1}^{\infty} \frac{p(j)}{j} \frac{x^{2j} - x^j}{x - 1}.$$

So it follows that

$$Q'(x) = \frac{Q(x^2) - Q(x)}{x - 1}$$

with initial conditions $Q(0) = 0, Q'(0) = 1$.

Now, as I recall there are theorems in Flajolet and Sedgewick's Analytic Combinatorics that allow us to deduce asymptotic behavior about the coefficients of $Q$ from these kinds of relations; I'm going to hunt one down now, and in the meantime others can see what they can do with this.


Here is an observation: It looks like $(2n) p(2n) - c$ converges to $0$ much faster than $(2n+1) p(2n+1)-c$ does. Here is a plot of the points $(\log n, n p(n) -c)$, with blue points for$n$ even and red points for $n$ odd.

alt text

It might be easiest to first prove the result for $n$ even, then use the fact that $(2n) p(2n) - (2n-1) p(2n-1) = p(2n) + p(2n-1)$ to get that the odd and even points must have the same limit.