Jensen's inequality for strictly convex functions and the case of equality

Definition 1. A convex function $f:(a,b)\to \mathbb{R}$ defined on an open interval $(a,b)\subset \mathbb{R}$ is convex if the inequality $$f(\alpha_1x_1+\alpha_2x_2)\leq \alpha_1f(x_1)+\alpha_2f(x_2)$$ holds for any points $x_1,x_2\in (a,b)$ and any numbers $\alpha_1\geq 0,\ \alpha_2\geq 0$ such that $\alpha_1+\alpha_2=1$. If this inequality is strict whenever $x_1\neq x_2$ and $\alpha_1\alpha_2\neq 0$, the function is strictly convex on $(a,b)$.

Proposition 7. (Jensen's inequality). If $f:(a,b)\to \mathbb{R}$ is a convex function, $x_1,\dots,x_n$ are points of $(a,b)$, and $\alpha_1,\dots,\alpha_n$ are nonnegative numbers such that $\alpha_1+\dots+\alpha_n=1$, then $$f(\alpha_1 x_1+\dots+\alpha_n x_n)\leq \alpha_1 f(x_1)+\dots+\alpha_nf(x_n). \qquad (*)$$

We remark that, as the proof shows, a strict Jensen's inequality corresponds to strict convexity, that is, if the numbers $\alpha_1,\dots,\alpha_n$ are nonzero, then equality holds in $(*)$ if and only if $x_1=\dots=x_n$.

The last sentence seems to me very confusing. I know the proof of Jensen's inequality and it is based on induction. I see that if the function $f$ is strictly convex then we obtain strict inequality in Jensen. But then as far as I understand he tells the following result is valid, right?

If $f$ is strictly convex on $(a,b)$ and $\alpha_i>0$ with $\sum \limits_i \alpha_i=1$. Then $f(\alpha_1 x_1+\dots+\alpha_n x_n)= \alpha_1 f(x_1)+\dots+\alpha_nf(x_n)$ if and only if $x_1=\dots=x_n$.

If yes, then how to prove it, namely the $\Rightarrow$ part?

EDIT: I guess the author tries to say that the following two results are valid.

Theorem 1. Suppose that $f:(a,b)\to \mathbb{R}$ is strictly convex on $(a,b)$. If $x_1,\dots,x_n\in (a,b)$ and $x_i\neq x_j$ for $i\neq j$ and $\lambda_1,\dots,\lambda_n>0$ with $\lambda_1+\dots+\lambda_n=1$ then $$f(\lambda_1 x_1+\dots+\lambda_n x_n)<\lambda_1 f(x_1)+\dots+\lambda_n f(x_n).$$ The proof of this is based on induction and is almost the same as regular Jensen's inequality.

Theorem 2. Suppose that $f:(a,b)\to \mathbb{R}$ is strictly convex on $(a,b)$. If $x_1,\dots,x_n\in (a,b)$ and $\lambda_1,\dots,\lambda_n>0$ with $\lambda_1+\dots+\lambda_n=1$. Then $$f(\lambda_1 x_1+\dots+\lambda_n x_n)=\lambda_1 f(x_1)+\dots+\lambda_n f(x_n) \ \text{if and only if} \ x_1=\dots=x_n$$

I only have issues with $\Rightarrow$ part.


Solution 1:

Sorry, my comment was a little confusing. I think this just boils down to the definition of strict convexity.


Let $n=2$. The definition of convexity implies $$f(\lambda_1 x_1 + \lambda_2 x_2) \le \lambda_1 f(x_1) + \lambda_2 f(x_2)$$ when $\lambda_1 + \lambda_2 = 1$. The definition of strict convexity is that this inequality is strict for $\lambda_1, \lambda_2 > 0$ and $x_1 \ne x_2$. So the only way equality holds is if $\lambda_1 = 0$ or if $\lambda_2 = 0$ or if $x_1 = x_2$. Since $\lambda_1,\lambda_2>0$ by assumption, this proves $x_1=x_2$, which is the $\implies$ claim for $n=2$ in Theorem 2.


For arbitrary $n$, we have \begin{align} &f(\lambda_1 x_1 + \cdots + \lambda_n x_n) \\ &\le (1-\lambda_n)f\left(\frac{1}{1-\lambda_n}(\lambda_1 x_1 + \cdots + \lambda_{n-1} x_{n-1})\right) + \lambda_n f(x_n) \\ &\le \lambda_1 f(x_1) + \cdots + \lambda_{n-1}f(x_{n-1}) + \lambda_n f(x_n),\end{align} where the first step is the definition of convexity directly, and where the second step compresses many applications of the definition of convexity at once (equivalently, appealing to an inductive hypothesis as in Proposition 7).

When proving $\implies$, the assumption is that both of the above inequalities are equality.

By induction (of $\implies$ in Theorem 2), the second inequality being an equality implies $x_1 = \cdots = x_{n-1}$.

By definition of strict convexity, the first inequality being an equality implies $x_n = \frac{1}{1-\lambda_n}(\lambda_1 x_1 + \cdots + \lambda_{n-1} x_{n-1})$. But since $x_1 = \cdots = x_{n-1}$ this ends up being $x_1 = \cdots = x_{n-1} = x_n$.