Asymptotic difference between a function and its binomial average

The origin of this question is the identity $$\sum_{k=0}^n \binom{n}{k} H_k = 2^n \left(H_n - \sum_{k=1}^n \frac{1}{k 2^k}\right),$$ where $H_n$ is the $n$th harmonic number.

Dividing by $2^n$, we have $$2^{-n} \sum_{k=0}^n \binom{n}{k} H_k = H_n - \sum_{k=1}^n \frac{1}{k 2^k}.$$

The sum on the left can now be interpreted as a weighted average of the harmonic numbers through $H_n$ -- where the weights, of course, are the binomial coefficients. Thus the difference between $H_n$ and its "binomial average" (I'm guessing there's no term for this) is $$H_n - 2^{-n} \sum_{k=0}^n \binom{n}{k} H_k = \sum_{k=1}^n \frac{1}{k 2^k}.$$

The sum on the right is known to converge to $\ln 2$ as $n \to \infty$. (Substitute $-\frac{1}{2}$ into the Maclaurin series for $\ln (1+x)$.)

This leads me to my question:

Can we classify nonnegative functions $f(n)$ for which $$\lim_{n \to \infty} \left(f(n) - 2^{-n} \sum_{k=0}^n \binom{n}{k} f(k) \right)$$ is finite and nonzero?

It would seem that if $f$ increases sufficiently rapidly, then the limit would be $\infty$. This is the case with both $f(n) = a^n$ and $f(n) = n$. If $f$ decreases or is constant, then the limit is zero. If $f$ has basically logarithmic growth, then it seems the limit would behave as $H_n$. But can this be proved? And what about other sublinear, increasing functions?


Solution 1:

The function $f(n)$ must be $\Theta (\log n)$. Update: As Didier Piau points out in the comments to the question after I asked it on Math Overflow, we can say something stronger: $\frac{f(n)}{\log_2 n} \to L$ as $n \to \infty$. See the update at the end of the argument.

Suppose, for some positive $L$ (the negative case is similar), $$\lim_{n \to \infty} \left(f(n) - 2^{-n} \sum_{k=0}^n \binom{n}{k} f(k) \right) = L.$$ Thus $$f(n) = L + r(n) + 2^{-n} \sum_{k=0}^n \binom{n}{k} f(k)$$ for some function $r(n)$ such that $r(n) \to 0$ as $n \to \infty$. This gives us a recurrence relation for $f(n)$. Since $r(n) \to 0$, $L + r(n) > 0$ for all sufficiently large $n$. So, for all sufficiently large $n$, there exist positive constants $C$ and $D$ such that $C < L + r(n) < D$. Since the initial terms in the function eventually become negligible in determining the value of $f(n)$ via the recurrence relation, there exist functions $g(n)$ and $h(n)$ such that for all sufficiently large $n$, $g(n) \leq f(n) \leq h(n)$ and $g(n)$ and $h(n)$ satisfy $$g(n) = C + 2^{-n} \sum_{k=0}^n \binom{n}{k} g(k),$$ $$h(n) = D + 2^{-n} \sum_{k=0}^n \binom{n}{k} h(k).$$

So the problem reduces to showing that $g(n)$ and $h(n)$ are $\Theta (\log n)$. The argument is the same for both.

There are some different ways to do this; my favorite is to interpret the $g(n)$ recurrence probabilistically. Suppose we start at time $g(0)$, we flip a set of $n$ coins simultaneously each round, and it takes $C$ time units to do one round of flips. When a coin turns up heads for the first time, we cease flipping it. Let $T(n)$ be the time at which the last coin to achieve its first head does so. To find $E[T(n)]$, condition on the number of coins that achieve heads in the first round of flips. This yields $$E[T(n)] = C + 2^{-n} \sum_{k=0}^n \binom{n}{k} E[T(n-k)] = C + 2^{-n} \sum_{k=0}^n \binom{n}{k} E[T(k)].$$ Thus $g(n) = E[T(n)]$.

Another way to view $T(n)$ is that it is $g(0) + C M_n$, where $M_n = \max\{X_1, X_2, \ldots, X_n\}$ and the $X_i$'s are independent and identically distributed geometric $(1/2)$ random variables. (Each geometric random variable models the first time a head appears.) Thus $g(n) = g(0) + C E[M_n]$. It is known that $\frac{H_n}{\log 2} \leq E[M_n] \leq \frac{H_n}{\log 2} + 1$ and, more precisely, that $E[M_n]$ is logarithmically summable to $\frac{H_n}{\log 2} + \frac{1}{2}$. (See, for example, Bennett Eisenberg's paper "On the expectation of the maximum of IID geometric random variables," Statistics and Probability Letters 78 (2008) 135-143. See also this MO question, "What is the Expected Maximum out of a Sample (size $n$) from a Geometric Distribution?")

Thus $g(n) = \frac{C}{\log 2} \log n + O(1)$, which means that $h(n) = \frac{D}{\log 2} \log n + O(1)$ and $f(n) = \Theta (\log n)$.

Update: Given $\epsilon > 0$, if we take take $C > 0$ such that $L - \epsilon \leq C < L$ and $D = L + \epsilon$, this argument shows that $$L - \epsilon + O\left(\frac{1}{\log n}\right) \leq \frac{f(n)}{\log_2 n} \leq L + \epsilon + O\left(\frac{1}{\log n}\right).$$

Thus, as $n \to \infty$, $\frac{f(n)}{\log_2 n} \to L.$

For some other ideas that pertain to this result, including what are effectively some alternative derivations, see Pradipta's recent MO question, "Coin Flipping and a Recurrence Relation". In fact, reading Pradipta's question and some of its answers gave me the ideas needed to construct this argument! So, thanks go to Pradipta, Didier Piau, Emil Jeřábek, and Louigi Addario-Berry.

Solution 2:

Here's an idea that might lead somewhere. Let $F(x) = \sum_{n \ge 0} \frac{f(n)}{n!} x^n$, let $g(n) = \frac{1}{2^n} \sum_{k=0}^{n} {n \choose n-k} f(k)$, and let $G(x) = \sum_{n \ge 0} \frac{g(n)}{n!} x^n$. Then it's not hard to see that $G(x) = e^{\frac{x}{2}} F \left( \frac{x}{2} \right)$, so we are looking for functions $F$ such that

$$F(x) - e^{\frac{x}{2}} F \left( \frac{x}{2} \right) = A(x)$$ for some $A(x)$ with Taylor coefficients tending to some nonzero limit. Let $H(x) = e^x F(x)$. Now we want $$H(x) - H \left( \frac{x}{2} \right) = e^{-x} A(x).$$

This condition is easier to work with on the left now but harder to work with on the right.