Evaluate $\int_{0}^1 \prod_{k=2}^n \lfloor kx \rfloor dx$

Let $n\ge2$ be an integer , then $$\int_{0}^1 \prod_{k=2}^n \lfloor kx \rfloor dx=\text{ ?}, $$ where $\lfloor \space \rfloor$ is the "floor-function"


This is not an answer but a note how to evaluate the integral for larger $n$.

Part I - the formula

For any integer $k$ such that $2 \le k \le n$ and $x \in [0,1)$, we have following decomposition:

$$\lfloor k x \rfloor = \sum_{j=1}^{k-1} \theta\left(x - \frac{j}{k}\right) \quad\text{ where }\quad \theta(x) = \begin{cases}1,& x \ge 0\\0,& x < 0\end{cases} $$ Notice for any $x, y_1, y_2, \ldots, y_m \in \mathbb{R}$, we have $$\prod_{k=1}^m \theta(x - y_k) = \theta\left(x - \max_{1 \le k \le m}\{y_k\} \right)$$

This leads to

$$\begin{align} \int_0^1 \prod_{k=2}^n \lfloor kx\rfloor dx &= \sum_{j_2 = 1}^{1}\sum_{j_3 = 1}^{2}\cdots\sum_{j_n=1}^{n-1} \int_0^1 \prod_{k=2}^n \theta\left(x - \frac{j_k}{k}\right) dx\\ &= \sum_{j_2 = 1}^{1}\sum_{j_3 = 1}^{2}\cdots\sum_{j_n=1}^{n-1} \int_0^1 \theta\left(x - \max_{2\le k \le n}\left\{\frac{j_k}{k}\right\}\right) dx\\ &= \sum_{j_2 = 1}^{1}\sum_{j_3 = 1}^{2}\cdots\sum_{j_n=1}^{n-1} \left(1 - \max_{2 \le k \le n}\left\{\frac{j_k}{k}\right\}\right)\\ &= \sum_{j_2 = 1}^{1}\sum_{j_3 = 1}^{2}\cdots\sum_{j_n=1}^{n-1} \min_{2\le k \le n}\left\{\frac{j_k}{k}\right\} \end{align} $$ There are totally $(n-1)!$ terms in above sum. We will calculate the sum as if it is a problem of taking expectation values.

$$ \int_0^1 \prod_{k=2}^n \lfloor kx\rfloor dx = (n-1)! \;\mathbf{E}(X) = (n-1)!\;\mathbf{E}\left( \min_{2\le k \le n}\left\{X_k\right\}\right)\tag{*1}$$ where $X = \min\limits_{2 \le k \le n} \left\{ X_k \right\}$ and $X_k, k = 2,\ldots, n$ are uniform discrete random variables: $$X_k \in \left\{\; \frac1k, \frac2k, \ldots \frac{k-1}{k} \;\right\} \text{ each with probability } \frac{1}{k-1}. $$

There are $1 + 2 + \cdots + n-1 = \frac{n(n-1)}{2}$ combinations of $\frac{j}{k}$. Some of these fractions share same numerical value. For simplicity of presentation, we will treat fractions with same numerical value as distinct numbers separated by an infinitesimal amount.

Let us sort them into ascending order. We will collect those with value $\le \frac12$ and label them as

$$\frac{1}{n} = \frac{p_1}{q_1} \le \frac{p_2}{q_2} \le \cdots \le \frac{p_\ell}{q_\ell} = \frac{1}{2}.$$

We can then evaluate $\mathbf{E}(x)$ by systematically break it down into conditional expectation values. More precisely, let $R_k$ and $S_k, k = 1, \ldots, \ell$ be the events:

$$R_k = \mathbf{Event}\left( X = \frac{p_k}{q_k} \right)\quad\text{ and }\quad S_k = \mathbf{Event}\left( X \ge \frac{p_k}{q_k}\right)$$

We have $$ \begin{align} \mathbf{E}(X) = \mathbf{E}(X \mid S_1) &= \mathbf{P}(R_1 \mid S_1) \frac{p_1}{q_1} + \mathbf{P}( S_2 \mid S_1 )\mathbf{E}( X \mid S_2 )\\ &= \frac{1}{q_1 - p_1} \frac{p_1}{q_1} + \left(1 - \frac{1}{q_1-p_1}\right)\mathbf{E}(X \mid S_2)\\ \mathbf{E}(X \mid S_2) &= \mathbf{P}(R_2 \mid S_2) \frac{p_2}{q_2} + \mathbf{P}( S_3 \mid S_2 )\mathbf{E}( X \mid S_3 )\\ &= \frac{1}{q_2 - p_2} \frac{p_2}{q_2} + \left(1 - \frac{1}{q_2-p_2}\right)\mathbf{E}(X \mid S_3)\\ &\;\vdots\\ \mathbf{E}(X \mid S_k) &= \mathbf{P}(R_k \mid S_k) \frac{p_k}{q_k} + \mathbf{P}( S_{k+1} \mid S_k )\mathbf{E}( X \mid S_{k+1} )\\ &= \frac{1}{q_k - p_k} \frac{p_k}{q_k} + \left(1 - \frac{1}{q_k-p_k}\right)\mathbf{E}(X \mid S_{k+1})\\ &\;\vdots\\ \mathbf{E}( X \mid S_\ell ) &= \frac12 \end{align} $$

Combine these, we convert $\mathbf{E}(X)$ and hence the original integral into a sum: $$ \bbox[8pt,border:1px solid blue]{ \mathbf{E}(X) = \sum_{k=1}^\ell \frac{1}{q_k-p_k}\frac{p_k}{q_k} \prod_{j=1}^{k-1}\left(1 - \frac{1}{q_j-p_j}\right) } \tag{*2}$$

Part II - the algorithm/implementation

There are many ways to evaluate the sum in $(*2)$. All takes $O(\ell\log\ell) = O(n^2\log n)$ steps to prepare the sorted list of fractions $\frac{p_k}{q_k}$ and take another $O(n^2)$ steps to compute the sum. To reduce round off error, it will be better to evaluate the sum in reverse order of $k$. More precisely, set $a_{\ell+1} = 0$ and recursively compute $a_\ell, a_{\ell-1}, \ldots, a_1$ by the recurrence relation:

$$a_k = \left(1-\frac{1}{q_k-p_k}\right) a_{k+1} + \frac{1}{q_k-p_k}\frac{p_k}{q_k}.\tag{*3} $$ At the end, $a_1$ will be the sum we need.

For small to medium $n \sim 5\times 10^3$, we can simply compute the $\ell$ fractions $\frac{p_k}{q_k}$, use $O(n^2)$ storage to store them and everything work. On an ordinary PC, this storage consumption becomes the first bottle neck for $n \sim O(10^4)$.

If one look back at the recurrence relation/procedure $(*3)$, one really don't need to know all $\frac{p_k}{q_k}$ all the time. Instead, what one need is the largest fraction $\frac{p}{q}$ among what is left to process. At any moment, there are at most $n$ candidates for the largest remaining fraction.

We can use the data structure heap to hold the $\le n$ potential candidates for the largest remain fraction. This reduces the storage requirement from $O(n^2)$ to $O(n)$ and allow us to extend the computation to $n \sim 5 \times 10^4$.

In case anyone is interested, following is a piece of pseudo code computing the sum in $(*2)$.

function calculate_sum( N ){
    heap = Heap->new( order => '>' );
    for( i = 2; 2 <= N; i++ ){
        heap->insert( Fraction->new( p => floor(i/2), q => i ) );
    }
    sum = 0;
    // heap->extract return the largest element on heap
    while( fraction = heap->extract ){ 
       p = fraction->p;
       q = fraction->q;
       f = 1/(q - p);
       sum = (1 - f)*sum + f*(p/q);
       if( p > 1 ){
           heap->insert( Fraction->new( p => p-1, q => q );
       }
    }
    return sum;
}

Part III - the numerical results

Following are some numerical results.

Instead of $\mathbf{E}(X)$, I find it more convenient to study its reciprocal $\displaystyle\;\eta_n = \frac{1}{\mathbf{E}(X)}\;$. It seems the leading dependence of $\eta_n$ on $n$ is linear.

In following table, $\alpha_n$ is an estimate of the slope of $\eta_n$ vs. $n$. $\beta_n$ is the corresponding constant offset. They are estimation based on the formulas: $$\alpha_n = \frac{\eta_n - \eta_{n-100}}{100}\quad\text{ and }\quad \beta_n = \eta_n - \alpha_n n$$

$$\begin{array}{|r:rrr:r|} \hline n & \eta_n & \alpha_n & \beta_n & \delta_n^{(eye)}\\ \hline 1000 & 440.57849249 & 0.439642912524 & 0.935579966035 & 0.306940385013\\ 2000 & 880.22159103 & 0.439643171269 & 0.935248489314 & 0.306956239421\\ 3000 & 1319.86479184 & 0.439643216555 & 0.935142177807 & 0.306876766506\\ 4000 & 1759.50801827 & 0.439643232266 & 0.935089199737 & 0.306860089425\\ 5000 & 2199.15125496 & 0.439643239853 & 0.935055695099 & 0.307048190306\\ 6000 & 2638.79449669 & 0.439643242826 & 0.935039733871 & 0.306851355739\\ 7000 & 3078.43774142 & 0.439643245039 & 0.935026141386 & 0.307066132100\\ 8000 & 3518.08098802 & 0.439643247922 & 0.935004641647 & 0.307743460185\\ 9000 & 3957.72423577 & 0.439643249057 & 0.934994257881 & 0.308001451641\\ 10000 & 4397.36748444 & 0.439643249954 & 0.934984897053 & 0.308851123061\\ \hline \end{array}$$

As one can see, both $\alpha_n$ and $\beta_n$ seems to converge quickly.

A vanilla linear regression using the data-set $\{\; \eta_{100k} : k = 1,\ldots,100 \;\}$ gives us

$$\eta_n \approx \alpha^{(reg)} n + \beta^{(reg)} = 0.43964319256906 n + 0.93539254734969$$

Compare with above estimation, we find a vanilla/barebone regression give us an estimation for slope $\alpha$ and offset $\beta$ accurate to about 6 and 3 decimal places respectively.

To obtain a better estimate, we vary the parameter $(\alpha,\beta)$ away a little bit from $(\alpha^{(reg)}, \beta^{(reg)})$ by hand. Instead of minimizing any sort of square error, we attempt to pick an $(\alpha, \beta)$ to make following expression

$$\delta_n = n ( \eta_n - \alpha n - \beta )$$ qualitatively as simple as possible. Based on my own subjective judgement, following formula $$\eta_n \approx \alpha^{(eye)} n + \beta^{(eye)} = 0.439643252 n + 0.93493355$$ produces a reasonable flat value of $\delta_n$.

During this process of subjective estimation, I haven't consulted the slope and offset computed in above table. It is interesting to see I get an $\alpha^{(eye)}$ and $\beta^{(eye)}$ which matches the corresponding values in last column of above table. This suggests the values of $\alpha^{(eye)}$ and $\beta^{(eye)}$ are accurate up to $8$ and $4$ decimal places respectively.

The last column in above table shows the values of $\delta_n$ computed using $\alpha^{(eye)}$ and $\beta^{(eye)}$. As one can see, the $3^{th}$ decimal places of $\delta_n$ are sort of random. This indicate for this part of calculation, we are hitting artifacts caused by the round off errors using standard IEEE double precision numbers.

Despite this drag back, one find all these $\delta_n$ are very close to the number $0.31$. Based on this, it is reasonable to conjecture the leading behavior of the integral is given by $$ \bbox[8pt,border:1px solid blue]{ \int_0^1 \prod_{k=2}^n\lfloor k x \rfloor dx \sim \frac{(n-1)!}{\alpha n + \beta + O\left(\frac{0.3}{n}\right)} \quad\text{ with }\quad \begin{cases} \alpha &\approx 0.43964325(2)\\ \\ \beta &\approx 0.9349(3355). \end{cases} } $$

At the end are two figures which hopefully illustrate the situation.

  1. The first figure is a plot of $\eta_n$ vs. $n$.
    As one can see, $\color{blue}{\eta_n}$ lies right on top of the straight line $\color{red}{\eta = \alpha^{(eye)} n + \beta^{(eye)}}$.

  2. The second figure is a plot of $\delta_n$ vs. $n$.

    • At $n \sim 10^3$, $\delta_n$ start to stabilize at a value $\approx 0.307$.
    • At $n \sim 5 \times 10^3$, $\delta_n$ start to pickup artifacts due to round off error.

    Despite these artifacts, $\delta_n$ remains very close to $0.307$ over the whole data range studied.

$\hspace0.6in$ Eta vs N

$\hspace0.4in$ Delta vs N

Part IV - ???