How to solve this estimate $m\{t \in[a, b]:|f(t)| \leq \varepsilon\} \leq 4\left(q ! \frac{\varepsilon}{2 \beta}\right)^{\frac{1}{q}}$

(a) Assume that the function $f:[a, b] \rightarrow \mathcal{R}$ with $a<b$ is differentiable and satisfies $\left|f^{\prime}(t)\right| \geq \beta$ for all $t \in[a, b]$ for some $\beta>0 .$ Prove the following estimate $$ m\{t \in[a, b]:|f(t)| \leq \varepsilon\} \leq \frac{2 \varepsilon}{\beta} \quad \text { for } \varepsilon>0 $$ where $m\{B\}$ denotes the Lebesgue measure of set $B$.

(b) Assume that the function $f:[a, b] \rightarrow \mathcal{R}$ with $a<b$ is $q$-times continuously differentiable and satisfies $\left|f^{(q)}(t)\right| \geq \beta$ for all $t \in[a, b]$ for some positive integer $q$ and $\beta>0 .$ Prove the following estimate $$ m\{t \in[a, b]:|f(t)| \leq \varepsilon\} \leq 4\left(q ! \frac{\varepsilon}{2 \beta}\right)^{\frac{1}{q}} \quad \text { for } \varepsilon>0 $$ where $m\{B\}$ denotes the Lebesgue measure of set $B$.

I have proved $(a)$. Noting that we can assume that $f^{\prime}(t)\geq \beta$ and $a= 0$, so $f(t)\geq \beta t+f(0) $ $(*)$. However, it's easy to show that for any linear function with slope $\beta$, $(a)$ is correct. By $ (*)$ one can prove $(a)$ for all differentiable $f$ with $\left|f^{\prime}(t)\right| \geq \beta$.

For (b), the method we use in (a) is out of work, since we cannot get the inequality like $(*)$. I also tried to prove it by induction but falied.

May anyone give me some hint or correct solution? Thanks in advance!!!

Solution 1:

Introduction

I cannot believe that the second question was given as any kind of homework. It is a quite outstanding question, and I think I haven't got the best constants either.

The topic under discussion is called "sublevel set estimation". Why? Well, the set $\{x : |f(x)| \leq \epsilon\}$ is the ($x$ coordinates of the) part of the graph that lies between the lines $y=-\epsilon$ and $y=\epsilon$. If it's a non-negative function, then we're looking at just the part of the graph lying below the line $y=\epsilon$ : hence the word "sublevel".

We are estimating the measure of a sublevel set in this question. The point, however, is that this is a research topic : which is precisely why I'm going to be providing a detailed answer, rather than one with hints.

Throughout this answer, $E = \{x : f(x) \leq \epsilon\}$, for $\epsilon>0$ fixed but arbitrary, and $f^k$ denotes the $k$th derivative of a function $f$.

We will prove a result extremely similar to the one in the question, but it seems to be different at the end. I can confirm that my working is correct, though.

We will assume, throughout, that $f^k > \beta$. Indeed, $f^k$ hasthe intermediate value property, therefore it is either true that $f^k$ either stays above $\beta$ or below $-\beta$, and a switch from $f$ to $-f$ tells you it's enough to consider one case.

Intuition

whence it follows that $|x-y| \leq \frac{2\epsilon}{\beta}$ for all $x,y \in E$ and the desired inequality follows readily by taking $x=\inf E,y = \sup E$ and using monotonicity of the Lebesgue measure.

Now, what is the trouble in the two-dimensional case with this approach? There, if we decide to use Taylor, we get : $$ f(x)-f(y) = f'(y)(x-y) + \frac{f''(\xi)}{2}(x-y)^2 $$

It feels like you can do something with this : except, you cannot get rid of the arbitrariness of $f'(y)$ here, and that plays the argument out of the park. If you try to proceed , you will not be able to shake off the contribution of the $f'$ term , even with the most naive of approaches, basically because $f'$ can be anything.

Ingenious observation

It is now clear to us, what needs to be done. Let's stick to the question for the second derivative. The point is this : passing from the second derivative to the first derivative cannot be done via Taylor's formula, because that involves the arbitrary-looking first derivative.

Therefore, any key insight to this problem is to find a way to link the second derivative to the initial function, without involving the first derivative at all!

In more generality , it is to basically link the function values to that of any larger derivative , so that once you use the bound on the derivative there, then you can easily translate this to something about the function values!

Can we do this though?

Inventive Rolle

What is a result that we know does something like this? It's the generalized Rolle's theorem, which basically states (roughly) that if $f$ is $k$ times differentiable on $[c,d]$ and there are $k+1$ distinct points in $[c,d]$ where $f$ vanishes, then there is a point where $f^k$ actually vanishes.

That's relating some function values, to a derivative value. Can we do better, though? The answer is, yes : we can actually derive a very generalized version of Rolle's theorem that allows us to get some derivative value as an exact expression of some given function values.

The theorem is quite involved, and relates to Lagrange interpolation. I quote it directly from the best paper I could find in the literature on sublevel estimation , by Carbery,Christ and Wright. I have made some changes to suit this to our discussion.

Let $k \geq 1$ and $a \leq a_1 < \ldots < a_{k+1} \leq b$ be distinct points in $[a,b]$. Suppose that $f$ is $k$ times continuously differentiable on $(a,b)$ and $k-1$ times continuously differentiable on $[a,b]$. Then, there is a $\xi \in (a,b)$ and explicit constants $c_m$ not depending upon $f$,such that: $$ f^{k}(\xi) = \sum_{m=1}^{k+1} c_m f(a_m) = f^{k}(\xi) $$

We will prove this using a "polynomial fitting" argument : which is where interpolation comes in.

Indefatigable proof

We start by trying to reduce this result to the Generalized Rolle's theorem (GRT). For that , we need a function such that it vanishes at $k+1$ points. What better than $a_1,...,a_{k+1}$ for those points?

What we will do is the following :

Prove and find a polynomial $p$ of degree $k$ such that $p(a_i) = f(a_i)$ for all $i=1,...,k+1$.
Apply the GRT to $f-p$, and pullback to $f$.

Finding the polynomial $p$ is a linear-algebraic exercise and is the meat of Lagrange interpolation. Indeed, suppose that $p(x) = b_kx^k + \ldots + b_0$. Then, the following product of matrices must hold true : $$ V_{k+1}(a_1,\ldots,a_{k+1}) [b_0,\ldots,b_k]^T = [f(a_1),\ldots,f(a_m)]^T $$

where $V_{k+1}$ is the Vandermonde matrix, defined by $[V_{k+1}]_{i,j} = a_i^{j-1}$. It is well-known by the distinct nature of the $a_i$, that $V_{k+1}$ is invertible, and therefore : $$ [b_0,\ldots,b_k]^T = V_{k+1}(a_1,\ldots,a_{k+1})^{-1} [f(a_1),\ldots,f(a_m)]^T $$

What we want from this, is actually the value of $b_{k}$, because we plan to apply the GRT so when we take the $k$th derivative of $p$ we're left with $k!b_k$. We can get that using the usual co-factor rule , and spotting the recurrence of the Vandermonde determinant, but without the $k$th powers and without $a_k$ appearing. Hence, $$ [V_{k+1}(a_1,\ldots,a_{k+1})^{-1}]_{k+1,j} = (-1)^{k+1+j} \frac{\det V_{k}(a_1,\ldots,a_{j-1},a_{j+1},\ldots,a_{k+1})}{\det V_{k+1} (a_1,\ldots,a_{k+1})} \\ = (-1)^{k+1+j} \prod_{i \neq j} |a_i - a_j|^{-1} $$

Thus, we get : $$ b_{k} = \sum_{i=1}^{k+1} (-1)^{k+i+1} f(a_i)\prod_{j \neq i} |a_j - a_i|^{-1} $$

and the polynomial $p$ has been constructed. Now, $(f-p)(a_i)=0$ for all $i$, so using GRT we get a $\xi$ where $(f-p)^k(\xi) = 0$. But then, $(f-p)^{k} = f^k - k!b_k$. Thus, it follows that $$ f^k(\xi) = \sum_{i=1}^{k+1} \underbrace{k!(-1)^{k+i+1}\prod_{j \neq i} |a_j - a_i|^{-1}}_{c_i} f(a_i) $$

as desired. $\blacksquare$

Intricate finish, but still more to go

I will now show the end of the proof , but we'll leave a nice part to the end. So how does the proof end? Well, we'd have $f'(\xi) = \sum_{i=1}^{k+1} c_if(a_i)$ for some $a_i \in E,c_i$ and $\xi$. By the derivative bound, $\beta \leq \sum_{i=1}^{k+1} c_if(a_i)$.

We will now find a relation of the form $|E|^k \leq \frac{k!C^k}{|c_i|}$ where $C$ is a constant independent of all parameters. This will be done by a very good choice of $a_i$. Once we do that, we get :$$ \beta \leq \sum_{i=1}^{k+1} f(a_i)\frac{k!C^k}{|E|^k} \leq (k+1)! \frac{C^k}{|E|^k} \max_{i} f(a_i) \leq (k+1)! \frac{C^k}{|E|^k} \epsilon $$

and hence $$ |E| \leq C\sqrt[k]{\frac{(k+1)!\epsilon}{\beta}} $$

which is the bound desired.

Inequality

We will now prove the claim in the previous section. For this, we make two observations :

Consider the quantity $\prod_{j \neq i} |a_j - a_i|$. This is basically maximized when the $a_i$ are as well-spread out from each other as possible. It's obvious, then, that when possible, they should be uniformly distributed.
Suppose that $E$ isn't an interval. Then, the point is that the inequality for $E$ is proved by proving the inequality for the smallest interval that contains $E$, simply because the $a_i$ continue to lie in the larger interval. Therefore, we can work with just intervals in this part.

Once we make these two observations, let $E = [c,d]$ and consider the $k+1$ "uniformly distributed" points $a_i = \frac{i}{k}c + \frac{k-i}{k}d$ for $i=0,...,k$. With these points , we can use an inequality like Jensen to prove that the $c_i$ is minimized at $\frac{k\pm 1}{2}$ for $k$ odd, and at $\frac k2$ for $k$ even, since the absolute value is log-convex.

With these, the minimal $c_i$, for $k$ odd,even respectively become : $$ \frac{k+1}{2k} \prod_{i=1}^{\frac{k-1}{2}} \frac{i^2}{k^2} \quad ;\quad \prod_{i=1}^{\frac{k}{2}} \frac{i^2}{k^2} $$

The (last but not the least) miraculous observation at this point is that both these quantities have good bounds. This follows from bounds relating to Stirling's approximation : for the second one, for example, we have $\frac{(k/2)!^2}{k^k}\geq \frac{2 \pi e^{-k}k^{k+1}}{2^{k+1}k^{k}} = k\pi(e^{-k}2^{-k}) \geq (2e)^{-k}$. The other one can be tackled similarly.

Hence, we can take $C = 2e$.

The final statement would then read :

If $f^{k} \geq \beta$ on $[a,b]$ then for any $\epsilon>0$, $$ |\{x : |f(x)| \leq \epsilon\}| \leq (2e)\sqrt[k]{(k+1)!\frac{\epsilon}{\beta}} $$

I think it will be difficult to improve upon these constants, unless the situation is refined even more. Having said that, my main aim was to show that this question is way more complicated than it looks initially.