Monotonic version of Weierstrass approximation theorem

This is a comment too long to fit in the usual format.

If you had asked for a bound in $\frac{1}{\sqrt{n}}$ rather than $\frac{1}{n}$, then there would be a nice answer using Bernstein polynomials (I find it especially nice as the proof consists of a chain of inequalities, each using a different part of mathematics). In this case everything behaves smoothly : one can compute an optimal bound for the error, and it is reached at rational values.

Recall that the $n-th$ Berstein approximation for $f$ is

$$ B_n(f)(x)=\sum_{k=0}^n f(\frac{k}{n})B_{n,k}(x), \ B_{n,k}(x)=\binom{n}{k}x^k(1-x)^{n-k} \tag{1} $$

As explained in the linked MO answer, an elegant probabilistic argument shows that $B_n(f)$ is nondecreasing if $f$ is. Let us put $M=\big|\big|\ f'\big|\big|_{\infty}$, so that $f$ is $M$-Lipschitz. One can then show the surprisingly simple chain of inequalities, each one “optimal” in its way : for any $x\in[\frac{i}{n},\frac{i+1}{n}]$, we have

$$ \big|B_n(f)(x)-f(x)\big| \leq 2M\binom{n-1}{i}x^{i+1} (1-x)^{n-i} \tag{2} $$

which is optimal because the inequality tends to equality when $f'$ tends to a staged function that equals $-M$ on $[0,x]$ and $+M$ on $[x,1]$. Next

$$ 2M\binom{n-1}{i}x^{i+1} (1-x)^{n-i} \leq 2M\binom{n-1}{i} \frac{(i+1)^{i+1} (n-i)^{n-i}}{(n+1)^{n+1}} \tag{3} $$

which is optimal because it becomes an equality for $x=\frac{i+1}{n+1}$ ; and

$$ 2M\binom{n-1}{i} \frac{(i+1)^{i+1} (n-i)^{n-i}}{(n+1)^{n+1}} \leq 2M\binom{n-1}{\lfloor\frac{n+1}{2}\rfloor} \frac{(\lfloor\frac{n+3}{2}\rfloor)^{\lfloor\frac{n+3}{2}\rfloor} (\lceil\frac{n-1}{2}\rceil)^{\lceil\frac{n-1}{2}\rceil}}{(n+1)^{n+1}} \tag{4} $$

which is optimal because it becomes an equality for $i=\lfloor\frac{n+1}{2}\rfloor$. Finally, Stirling's formula shows that the RHS above is asymptotically equivalent to $\frac{1}{\sqrt{\pi n}}$.


This is too long for a comment, too. Experimentally, another type of approach leads to better approximations. Assume that we want a polynomial approximation of degree $n$ of a $\mathcal{C}^1([0,1])$-function $f(x)$ with an $O(n^{-1})$ error term. Consider the function $g(x)$ whose graph is the convex envelope of the points $(0,f(0)),(1/n,f(1/n),\ldots,(1,f(1))$. Clearly: $$\| f(x)-g(x)\|_{\infty} \leq \frac{M}{n},$$ where $M$ is the Lipshitz constant of $f(x)$, i.e. $\max_{x\in[0,1]}|f'(x)|$. If $f(x)$ is a non-decreasing function, the stronger $$\| f(x)-g(x)\|_{\infty} \leq \frac{M}{2n}$$ holds. Let now $h_p(x)$ be a "step-identity" continuous function of the form $$ h_p(x)=\left\{\begin{array}{rl}0 &\mbox{if }x\leq p,\\(x-p) &\mbox{if }x\geq p.\end{array}\right.$$ If $t>0$, $k_{p,t}(x)=h_{p}(x)-h_{p+t}(x)$ is a continuous "step-identity-step" function whose derivative is zero for $x<p$ or $x>p+t$ and one for $x\in(p,p+t)$. Moreover, $g(x)$ can be written as a linear combination of $k_{p,t}$-functions, i.e.: $$ g(x) = \sum_{j=0}^{n-1} \frac{f\left(\frac{j+1}{n}\right)-f\left(\frac{j}{n}\right)}{\frac{1}{n}}\,k_{\frac{j}{n},\frac{1}{n}}(x), $$ so the problem can be substantially reformulated as: find polynomial approximations of $k_{0,\frac{1}{n}}(x)$ that are non-decreasing and have an $\|\cdot\|_{\infty}$-error term bounded by constant times the reciprocal of the degree. Up to integration, this is the same as finding a positive and "good" polynomial approximation of the characteristic function of the $[0,1/n]$ interval. Bernstein polynomials almost do this, since they are positive, but with the wrong error term.

Another reformulation is to find a tight polynomial upper bound for the absolute value function over the $[-1,1]$-interval. A classical try is to consider $q_n(y)$ as the truncated Taylor series of $\sqrt{1-y}$ w.r.t. $y=0$ and take $p_n(x)=q_n(1-x^2)$. This gives $q_n(x)\geq|x|$ as wanted, but the error term is $q_n(0)=O\left(\frac{1}{\sqrt{n}}\right)$, too big. Another try is to consider the Fourier-Chebyshev approximation of $|x|$, i.e. $$ p_{2n}(x) = \frac{2}{\pi}-\frac{4}{\pi}\sum_{k=1}^{n}\frac{(-1)^k T_{2k}(x)}{4k^2-1}. $$ $p_{2n}(x)=|x|$ holds in $2n+2$ points, so $p_{2n}(x)$ in not an upper bound neither a lower bound for $|x|$, but $|p_{2n}(x)-|x||$ is a fast decreasing function: $$|p_{2n}(x)-|x||\leq \min\left(\frac{2}{(2n+1)\pi},\frac{1}{2\pi n^2 |x|}\right) $$ and the error term is the right one: $\|p_{2n}(x)-|x|\|_{\infty}=p_{2n}(0)=\frac{2}{(2n+1)\pi}=O\left(\frac{1}{n}\right)$. If we approximate $k_{0,\frac{1}{n}}$ with $p_{2n}(x)-p_{2n}(x-1/n)$, then $g(x)$ with these approximated $k$s, we end with a polynomial approximation $r_{2n}(x)$ of $f(x)$ that satisfies: $$ \|f(x)-r_{2n}(x)\|_{\infty}\leq \frac{M}{n}+Mn\int_{-1}^{1}\min\left(\frac{2}{(2n+1)\pi},\frac{1}{2\pi n^2 |x|}\right)dx=O\left(\frac{\log n}{n}\right),$$ that is far better than the Bernstein approximation, but it is still not what we want.

Can we exploit the oscillations of the Chebyshev approximations in order to remove the logarithmic factor in the RHS? *UPDATE:* I believe not, since we have substantially built a trigonometric polynomial approximation through the convolution with respect to the Dirichlet kernel, whose $L^1$-norm is not bounded, but $\Theta(\log n)$ (this fact gives birth to the Gibbs phenomenon, for istance); then a reformulation is to find a "good" trigonometric kernel, for which "good" means non-negative (in order to preserve monotonicity) and almost orthogonal to continuous, piecewise-linear functions. In the literature I found that the Jackson kernel, that is the square of the Fejer kernel, satisfies all the prescribed properties, but I am still not satisfied, since the Fourier coefficients of the Jackson kernel are pretty complicated.

In order to get a monotonic polynomial approximation of a non-decreasing $\mathcal{C}^1([0,1])$-function $f(x)$, we can choose a positive $K$ such that $\sqrt{f'(x)+K}$ is regular enough to ensure the existence of a $O(n^{-1})$-polynomial approximation, then replicate fedja's argument, but the first step still seems out-of-reach, actually.