Differences among Cauchy, Lagrange, and Schlömilch remainder in Taylor's formula: why is generalization useful?

I would like to know what really are the main differences (in terms of "usefulness") among Cauchy, Lagrange, and Schlömilch's forms of the remainder in Taylor's formula.

Could you provide examples of situations where one form "works better" than another?

Also, what are the actual benefits of the new generalizations proposed for example in the following articles?

  1. BLUMENTHAL, L. M., Concerning the Remainder Term in Taylor's Formula. Amer. Math. Monthly 33, pp. 424-426, 1926.

  2. BEESACK, P. R., A General Form of the Remainder in Taylor's Theorem. Amer. Math. Monthly 73, pp. 64-67, 1966


As both Cauchy's and Lagrange's version of Taylor's Theorem is encapsulated in this theorem, let me restate Schlomilch-Roche's version of Taylor's Theorem:

Let $f$ be a $n$-times differentiable function on $[a,b]$ which is also $n+1$ times differentiable on $(a,b)$. Then for every $x_0$ and $x$ distinct from $[a,b]$ and every $p>0$ there is a $\xi$ strictly between both $x_0$ and $x$ such that $$f(x)=\sum_{k=0}^n\frac{f^{(k)}(x_0)}{k!}(x-x_0)^k +f^{(n+1)}(\xi)(x-\xi)^{n+1-p}\frac{(x-x_0)^p}{n!p}\,.$$

Every version of Taylor's Theorem says that the Taylor polynomial of some degree centered at some point can be used to approximate a given function on some (more than likely tiny) neighborhood. Thus the most important statement, in every version of Taylor's Theorem, is how we go about expressing the remainder. Taylor himself didn't actually incorporate an error term. It wasn't until Lagrange, and then Cauchy, came about that Taylor's Theorem was made rigorous. Thus Roche's version above can naively be appreciated as a theorem which interpolates the first rigorous expressions of the remainder.

Interpolation, in general, is a recurring useful idea through mathematics. In this way, Roche's Theorem can be viewed as generalizing Lagrange's and Cauchy's versions in the same way that Young's Inequality generalizes $|ab|\leq (a^2+b^2)/2$, the same way that Holder's Inequality generalizes Cauchy's Inequality, and the same way that the Generalized Power-Mean Inequality generalizes the AGM Inequality. None of these generalizations would be viewed as useless with hindsight. However, without a concrete problem given first before their presentation, their motivation might seem lacking. But enough experience with concrete problems should instill a belief that interpolation should be sought as its own end. Applications will more than likely follow.

A general pattern that happens with interpolating results is that they allow us to deal with singularities with more precision. In particular, they often can be used to shift enough weight away from a singularity to prove that a certain expression is finite.

Let me pull out the error term in Roche's Version to zoom in. $$\underbrace{f^{(n+1)}(\xi)}_{\text{Possibly no control}}\cdot\underbrace{(x-\xi)^{n+1-p}}_{\text{Some control}}\cdot\underbrace{\frac{(x-x_0)^p}{n!p}}_{\text{Controlled}}$$ The beauty of Roche's version that it reflects many interpolating generalizations: wherever you are able to assert more and more control on one term, you can lose control on some other terms. The problem with Mean-Value-Theorem-like results is the lack of control on where the $\xi$ may be. It could be arbitrarily close to $x_0$ or $x$. If either of these points happen to be close to a singularity of $f$, this could create difficulty for you in showing that the Taylor Series converges.

Roche's Theorem let's you play a betting game. If you suspect $x$ is close to a singularity and that $\xi$ is going to be close to $x$ (so that $f^{(n+1)}(\xi)$ is potentially large but $(x-\xi)$ is small) Roche's Theorem let's you shift weight away from the $(x-x_0)$ term so that you can try to get the first two terms in the product to battle each other into something finite. (I can't see why you would let $p$ go to zero however as that has diminishing returns.)

However, if you have fairly reasonable expectations that $x$ is not close to a singularity (so that $f^{(n+1)}(\xi)$ is tame), Roche's Theorem let's you shift more weight over to the $(x-x_0)$ which is still controlled by $1/n!$ so that the $(x-\xi)$ term doesn't become unnecessarily large. (I really can't see a reason why you would let $p$ go above $n+1$ however as that would create a seemingly unnecessary singularity with your bound.)

Let me demonstrate with an example. There are two elementary Taylor series which are notorious for being difficult to handle with Lagrange's Remainder. Namely $$(1+x)^r=\sum_{k=0}^\infty\binom{r}{k}x^k\qquad\text{and}\qquad \ln(1+x)=\sum_{k=1}^\infty\frac{x^k}{k}$$ where $r\in\mathbb{Q}$ (these two examples are not altogether unrelated).The first is not so bad when $r>0$ (in fact Lagrange's should work, even though I haven't checked). But something is different when $r<0$. Let me define $$f(x)=\frac{1}{(1-x)^r}$$ where $r>0$ (the extra negative creates some simplicity). Using Roche's Theorem and some algebraic trickery, we can calculate that for $|x|<1$ and for any $p>0$ that $$f(x)=\sum_{k=0}^n\frac{r^{\bar{k}}}{k!}x^k +\frac{r^{\overline{n+1}}}{n!p}\frac{1}{(1-\frac{\xi}{x}x)^{r+n+1}}\left(1-\frac{\xi}{x}\right)^{n+1-p}x^{n+1}$$

where $\xi$ is between $x$ and $0$ and $r^\overline{k}$ denotes the rising factorial. The term $x^{n+1}r^{\overline{n+1}}/n!$ can be shown to go to zero with the help of the Gamma function (it is a little bit of a challenge though with purely elementary approaches). Let's focus on the rest of the remainder term. Set $\theta=\xi/x$. We then have

$$\frac{(1-\theta)^{n+1-p}}{(1-\theta x)^{r+n+1}}=\left(\frac{1-\theta}{1-\theta x}\right)^{n+1-p}\frac{1}{(1-\theta x)^{r+p}}$$

We can actually show that for $|x|<1$ and for $0<\theta<1$ that we have $$0\leq\frac{1-\theta}{1-\theta x}\leq 1\qquad \text{and}\qquad\frac{1}{1-\theta x}\leq\frac{1}{\min(1, 1-x)}\,.$$

Thus, as long as $0<p<n+1$, we can show that the error term in the expression above tends to zero and the Taylor expansion holds. Thus in a very embarrassing sense, Lagrange's error term fails by just a hair.

I'm sure there are examples where the $p$ in Roche's Theorem can be unique to proving that the Taylor Series converges. Probably upon looking at different types of singularities. For example the collection of functions $$(1-x)^\alpha \ln^\beta(1-x)$$ probably hold the type of ugly expansions that Roche and ingenuity can attack. But I'm done.