Polyak-Lojasiewicz (PL) inequality proof

I'm have been looking for a while now for a proof to "Polyak-Lojasiewicz (PL) inequality", which states that:

if: $\forall x\in Dom(f):mI\succcurlyeq\nabla^{2}f(x)$ then:

$f(x)-f(x^{*})\geq\frac{1}{2m}\left|\left|\nabla f(x)\right|\right|_{2}^{2}$

where $ \succcurlyeq\ $is the positive semi-definite notation.

this is the only article that I have encountered who uses this inequality: http://liberzon.csl.illinois.edu/teaching/Polyak-Lojasiewicz.pdf

I have tried to prove it for my own in the following link, but lately I found a mistake and therefore I'm trying to prove it in other way:

proving:$f(x)-f(x^{*})\geq\frac{1}{2m}\left|\left|\nabla f(x)\right|\right|_{2}^{2}$

Solution 1:

Proving your Polyak-Lojasiewicz (PL) inequality is not an easy task, one of the easiest ways comes from assuming that the function is a Morse-Bott function of order 2, a proof can be found in this beautiful article. Proving the converse, i.e., every function that satisfies your Polyak-Lojasiewicz (PL) inequality is in fact a Morse-Bott function of order 2 it's even harder. It requires Resolution of singularities, a powerful tool in Algebraic Geometry.

The Polyak-Lojasiewicz (PL) inequality, and not what you have written, is not wrong, since it is a hypothesis. On the other hand, in general, the inequality is not true. For example, the functions $f(x)=x^{k}$ for $k$ even is convex, and it only satisfies $$\dfrac{1}{k}|f'(x)|\geq |f(x)|^{\dfrac{k-1}{k}}.$$ Observe that the exponents can be taken arbitrarily close to $1$, which means that the result starts getting worse and worse as it happens.

Generally, only the Lojasiewicz gradient inequality holds around a point, which is a weaker result, but yet strong. And proving it requires Algebraic Geometry as well. Again, the ideas of the proof can be found here.

I will prove Polyak-Lojasiewicz (PL) inequality for strongly convex functions $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ for completeness and my own future reference. Looking to the equivalent definitions of strongly convex functions, they say that a function is strongly convex whenever $$ \begin{align} \left( \nabla f(\boldsymbol x) - \nabla f(\boldsymbol y) \right)^{T}(\boldsymbol x - \boldsymbol y ) \geq & m \|\boldsymbol x- \boldsymbol y\|^{2},\\ \end{align} $$ for all $\boldsymbol x, \boldsymbol y \in \mathbb{R}^n.$ From what follows we need to assume the Lipschitz continuity of the gradient, this means that there exist an $L>0$ such that $$ f(\boldsymbol x)\leq f(\boldsymbol y) + \nabla f (\boldsymbol y)^{T} (\boldsymbol x - \boldsymbol y)+ \dfrac{L}{2} \|\boldsymbol x - \boldsymbol y\|^2,$$ for all $\boldsymbol x, \boldsymbol y \in \mathbb{R}^n.$

Finally, for the only minimizer $\boldsymbol y \in \mathbb{R}^{n}$, we have that $\nabla f (\boldsymbol y) =\boldsymbol 0$ and for all $\boldsymbol x \in \mathbb{R}^n$ $$ \begin{align} \left\| \nabla f(\boldsymbol x) \right\| = & \left\| \nabla f(\boldsymbol x) - \nabla f(\boldsymbol y) \right\|\\ \geq & m \|\boldsymbol x- \boldsymbol y\| \\ = & \dfrac{\sqrt{2} m}{\sqrt{L}} \sqrt{ \dfrac{L}{2} \|\boldsymbol x- \boldsymbol y\|^{2}} \\ = & \dfrac{\sqrt{2} m}{\sqrt{L}} \sqrt{ \dfrac{L}{2} \|\boldsymbol x- \boldsymbol y\|^{2} + \nabla f (\boldsymbol y)^{T} (\boldsymbol x- \boldsymbol y) + f(\boldsymbol y) - f(\boldsymbol y) } \\ \geq & \dfrac{\sqrt{2} m}{\sqrt{L}} \sqrt{ f(\boldsymbol x) - f(\boldsymbol y)}, \\ \end{align} $$ which is your desired result for the article.

As can be noticed, the Polyak-Lojasiewicz (PL) inequality is weaker than asking strong convexity of the function, and, by the Corollary 2, is equivalent to asking for the function being a Morse-Bott of order 2. That's why your article directly ask it instead of asking the strong convexity of the function.

In general, observe that all the arguments here could be seamlessly generalized considering that we are working functions that are locally strongly convex, which gives the intuition why asking for the inequality for convex functions it not much.

Polyak-Lojasiewicz (PL) inequality proof

Solution 1:

Related

Recent Posts