Why are asymptotically one half of the integer compositions gap-free?

Question summary

The number of gap-free compositions of $n$ can already for quite small $n$ be very well approximated by the total number of compositions of $n$ divided by $2$. This question seeks to understand why.

The details

A composition of an integer $n$ is a way of writing $n$ as the sum of a sequence of positive integers where order is important. This is different from partitions, where order is unimportant. It can easily be shown that the number of compositions of $n$ is $2^{n-1}$. See the Wikipedia article for a proof.

A lot has been published about how the number of compositions changes when you put restrictions on them. One really interesting such restriction is "gap-freeness", i.e. containing every integer which is between the smallest and the largest integer in the composition. For example, $2+3+2+4+5$ is a gap-free composition of $16$, but $2+4+2+1+5+2$ is not, since there is a gap between $2$ and $4$. Furthermore, a composition is called "complete" if it is gap-free and contains the number $1$, i.e. all numbers from $1$ up to some integer $m$ are used in the summation.

An example, the compositions of $n = 5$. g stands for gap-free and c for complete.

5           g
4+1
3+2         g
3+1+1
2+3         g
2+2+1       g, c
2+1+2       g, c
2+1+1+1     g, c
1+4
1+3+1
1+2+2       g, c
1+2+1+1     g, c
1+1+3
1+1+2+1     g, c
1+1+1+2     g, c
1+1+1+1+1   g, c

So, the number of compositions of $n=5$ is $2^{5-1} = 16$, the number of gap-free compositions is $11$ and the number of complete compositions is $8$.

Enumerating the number of compositions (#c), the number of gap-free compositions (#gc) and the number of complete compositions (#cc) for n from 1 to 10 gives the following table.

n       #c      #gc     #cc
1       1       1       1
2       2       2       1
3       4       4       3
4       8       6       4
5       16      11      8
6       32      21      18
7       64      39      33
8       128     71      65
9       256     141     127
10      512     276     264

Continuations of these series can be found at oeis.org/A107428 and oeis.org/A107429 respectively.

From this short sample, it can be conjectured that the number of complete compositions is asymptotically half of the total number of compositions. (This is also true for the gap-free compositions, since almost all of the gap-free compositions are complete, but it's not as obvious from this short sample.)

A plot of the deviation between the percentage of gap-free compositions from $\frac12$ for $50 \leqslant n \leqslant 4000$ is quite remarkable.

enter image description here $50 < n < 500$

enter image description here $500 < n < 4000$

Not only does it seem to converge quite rapidly, it also shows a strange oscillating behavior.

A couple of proofs of this fact have been published $[1,2]$. The first paper was written by Hitczenko and Knopfmacher $[1]$, where they used randomly generated compositions to obtain a probability that a composition was complete (a probability that approaches $1/2$ as $n \to \infty$).

The proof takes a probabilistic view of the compositions and says nothing (at least as far as I can understand) about why the ratio is exactly one half. However, the fact that asymptotically exactly half of the number of compositions are complete has lead me to think that there should be a way of thinking about this that is more intuitive or, perhaps, combinatorial in nature.

For example, the number of complete compositions of $n$ is asymptotically the same as the total number of compositions of $n-1$. A bijection can of course not be found, since it is only an asymptotic relation. But for large $n$, is there some way to make a qualitative (and asymptotic) relation between these two sets of compositions? Are there other "asymptotic bijections" that can be used to shed some light on this? Maybe something can be said about how the completeness or gap-freeness restriction relates to other restrictions? I am not looking for rigorous proof here, it's just that it seems as there should be some way to think about this more intuitively.

References
$[1]$ P. Hitczenko and A. Knopfmacher, Gap-free compositions and gap-free samples of geometric random variables, Discrete Math. 294 (2005) 225-239
$[2]$ R. Warlimont, Complete compositions of a natural number, Quaestiones Mathematicae Volume 29, Issue 2, 2006


Edit July 1, 2013

No progress has been made here, but I thought I'd just write down some facts that might give some inspiration and food for thought.

  • The number of complete partitions of $n$, $\# p_c(n)$, is the same as the number of partitions into distinct parts. The bijection can easily be seen using a Ferrers diagram. This is a very well-known series with the generating function $\prod_{k \ge 1} (1+x^k)$. (A000009 at oeis.org)

  • The number of complete compositions of $n$, $\# c_c(n)$, can be calculated by counting the number of permutations of all complete partitions. In more detail:

    • Generate a list of all complete partitions of $n$.

    • For each partition $P$ in the list, check the maximum part $m$. For each number $k = \{1,2,\dots,m\}$, check the number of occurences $\alpha_k$ of $k$ in $P$. The number of complete compositions generated by $P$ is $\frac{m!} {\alpha_1! \dots \alpha_m!}$, i.e. the multinomial $\binom{m}{\alpha_1,\dots,\alpha_m}$

    Can something be said about the asymptotic behavior of how the above operation maps $\# p_c(n)$ to $\# c_c(n)$? If so, $\# c_c(n) \sim f(n) \cdot \# p_c(n)$, where $f(n)$ is this asymptotic function.

  • $\# p_c(n)$ is asymptotic to $$\frac{e^{\pi(n-\frac{1}{24})^{1/2}}}{4 \cdot 3^{1/4}(n-\frac{1}{24})^{3/4}}$$ (according to oeis.org).

  • We $\begin{cases} \text{know that} \\ \text{would like to prove that } \\ \text{would like to understand why} \end{cases} \Biggr\}$ the number of complete compositions of $n$ $\# c_c(n) \sim 2^{n-2}$.


Edit July 9, 2013

Just a small note and a humble request. The number of compositions of $n$ starting with 1 is $2^{n-2}$. (Link)

Are there other enumerations that have (asymptotically) the same dependence of $n$? Thought it might give some ideas/insight.


Edit Feb 13, 2015

  • The generation of the number of gap-free and complete compositions can be done using recursive formulas published as Maple code on OEIS. Using memoization techniques, the sequences for $n$ up to $4000$ were generated.

  • The oscillating behavior can be quite well modeled by the following function

$$f(n)=\frac{An^B}{\log Cn}\sin\left(\frac{2 \pi}D \log(Dn+E)+F\right)$$ $$\begin{align} A &= 3.53292387 \cdot 10^{-4}\\ B &= -8.47099853 \cdot 10^{-1}\\ C &= 9.11885892 \cdot 10^{-3}\\ D &= 6.94057582 \cdot 10^{-1}\\ E &= 1.35920759 \cdot 10^1\\ F &= -1.15542736 \cdot 10^1\\ \end{align}$$


This is quite hand-wavy but I think that's acceptable considering you want intuition. There is $1$ key gap in the result, but maybe someone can come along and fill that in nicely.

First the notation:

$N(n)$ is the number of complete compositions of $n$

$N(n,a)$ is the number of complete compositions of $n$ with $a$ as the largest term

$N(n,a,i)$ is the number of complete compositions of $n$ with $a$ as the largest term and only a single term $i$ that has to be at the end of the sequence.

For each of those, let $\tilde{N}$ denote the set of sequences that fit the description.

For example: $N(5)=8$, $N(5,1)=1$, $N(5,2)=7$, $N(5,2,2)=1$, $N(5,2,1)=1$, $\tilde{N}(5,1)=\{1+1+1+1+1\}$, $\tilde{N}(5,2,2)=\{1+1+1+2\}$, $\tilde{N}(5,2,1)=\{2+2+1\}$

Now we introduce $2$ key relations. The first one:

$$N(n)=\sum_{a=1}^{n}N(n,a)$$

This one should make sense. The total number of complete compositions of $n$, is the number of complete compositions with a maximum term of $1$ + the number of complete compositions with a maximum term of $2$ + etc.

The second one:

$$N(n,a)=\sum_{i=1}^{a}N(n-i,a)+N(n,a,i)$$ To see this, take an arbitrary composition $c\in \tilde{N}(n,a)$. Write $c=c_1+c_2+...+c_m$, and consider the composition $c'=c_1+c_2+...+c_{m-1}$. Now there are $2$ possibilities:

  • $c'$ is a complete composition of $n-c_m$ with largest term $a$
  • $c'$ is an incomplete composition of $n-c_m$

If $c'$ is a complete composition of $n-c_m$ with largest term $a$, then $c'\in\tilde{N}(n-c_m,a)$. Otherwise, since $c$ was a complete composition of $n$ with largest term $a$ and $c'$ is not complete, $c_m$ must have been the only term with this value in $c$, so that removing it results in an incomplete composition. Therefore $c'\in \tilde{N}(n,a,c_m)$

You can do this reasoning the other way around as well to establish the claim.

Now we first have to agree that

$$\lim_{n\to \infty}\frac{\sum_{i=1}^{a}N(n,a,i)}{N(n,a)}=0$$

To see this, think about what compositions $\cup_{i=1}^{a}\tilde{N}(n,a,i)$ contains. This set contains all compositions that contain some term exactly once, where that term has to be the last term in the composition. It is 'intuitively clear' that this is an awfully specific subset of $\tilde{N}(n,a)$.

So that we can rewrite*:

$$N(n,a)\approx\sum_{i=1}^{a}N(n-i,a)$$

And now

$$ \begin{aligned} N(n+1) & =\sum_{a=1}^{n+1}N(n+1,a) \\ & \approx \sum_{a=1}^{n+1}\sum_{i=1}^{a}N(n-(i-1),a)\\ & = \sum_{a=1}^{n+1}\sum_{i=0}^{a-1}N(n-i,a)\\ & = \sum_{a=1}^{n}\sum_{i=0}^{a-1}N(n-i,a) + \sum_{i=0}^{n}N(n-i,n) \\ & = \sum_{a=1}^{n}[\sum_{i=1}^{a}[N(n-i,a)] + N(n,a) - N(n-a,a)]\\ & = \sum_{a=1}^{n}\sum_{i=1}^{a}N(n-i,a) + \sum_{a=1}^{n}N(n,a) - \sum_{a=1}^{n}N(n-a,a)\\ & \approx N(n) + N(n) - \sum_{a=1}^{n}N(n-a,a)\\ & \approx 2N(n) \end{aligned} $$

So now we established that $N(n)$ asymptotically will behave as $c2^n$ for some constant $c$. Now from the fact that the total number of compositions of $n$ equals $2^{n-1}$ we know that $c<\frac{1}{2}$. And from observation we know of course that $c=\frac{1}{4}$. However, I have not really been able to come up with an argument for why it is $\frac{1}{4}$. Maybe it makes sense to say that $c$ must be of the form $\frac{1}{2^m}$ because we do want $N(n)$ to be a whole number. So then all we have to do is it find a reasoning for why $N(n)>2^{n-3}$. So maybe someone who reads this will be able to complete this reasoning from here.

So this might seem like a useless result because we are still left with the question of why this constant is exactly $\frac{1}{4}$. However, with this reasoning we at least see why the ratio converges in the first place.

*A remarkable result of this approximation is that it turns out that $N(n,2)\approx\frac{1+\sqrt 5}{2}F_n$ where $F_n$ is the $n$th fibonacci number. And this approximation is very good (always only either $1$ or $2$ too large).