On the wikipedia site for the Kantorovich inequality, it is claimed

... the Kantorovich inequality is a particular case of the Cauchy–Schwarz inequality...

Here, "Kantorovich inequality" refers to $$ (x^\top A \, x) \, (x^\top A^{-1} \, x) \le \frac{(m+M)^2}{4 \, m \, M} \, \|x\|^4 $$ for a symmetric, positive definite matrix $A$ and $m$, $M$ denote the smallest and largest eigenvalue of $A$.

I was wondering what is meant by the above claim of wikipedia. Is it really true that the Kantorovich inequality is a particular case of CSI (in the sense of: "can be easily derived from")?

The closest assertion I was able to find is from this paper: If we plug in $x = \sqrt{A} \, y$, we arrive at $$ \| A \, y\|^2 \, \|y\|^2 \le \frac{(m+M)^2}{4 \, m \, M} \, (y^\top A \, y)^2 $$ which is a reverse of the special case $$ y^\top A \, y \le \|A \, y\| \, \|y\| $$ of CSI.


A recent related posting ask for conditions on equality in the Kantorovich inequality. I think is more appropriate to have a solution in this (older) posting and address the question here and the issue of equality.

The approach here is a probabilistic generalization of the result of the OP.

Theorem: Suppose $X$ is a random variable on $(\Omega,\mathscr{F},\mathbb{P})$ taking values in an interval $[a,b]$ with $0<a<b<\infty$. Then, $$\begin{align} 1\leq \mathbb{E}[X]\mathbb{E}[X^{-1}]\leq \frac{(a+b)^2}{4ab} \tag{0}\label{kan}\end{align}$$ Equality on the left-hand-side happens iff $X$ is constant $\mathbb{P}$-a.s; equality on the right-handside happens iff $\mathbb{P}[X=a]=\mathbb{P}[X=b]=\frac12$.

Proof: Since $\phi(x)=\frac{1}{x}$ is convex in $(0,\infty)$, the left-side inequality follows from Jensen's inequality. Equality iff either $\phi(X)$ is linear $\mathbb{P}$-a.s. or $X$ constant a.s.

The right-side inequality follows from the Cauchy-Schwartz inequality. First notice that for any part of square integrable random variables $X,Y$ $$\mathbb{E}\big[(X-\mathbb{E}(X))(Y-\mathbb{E}[Y])\big]\leq\big(E\big[(X-\mathbb{E}(X))^2\big]\big)^{1/2}\big(E\big[(Y-\mathbb{E}(Y))^2\big]\big)^{1/2}$$ Equality iff either one of the random variables is a constant $\mathbb{P}$-a.s., or if there is $c>0$ such that $|X-\mathbb{E}[X]|=c|Y-\mathbb{E}[Y]|$. This is a well known result in Probability: the covariance of two random variables us at most the product of the variances of the random variables.

Taking $Y=X^{-1}$, we get $$\mathbb{E}[X]\mathbb{E}[X^{-1}]\leq 1+\Big(E\big[\big(X-\mathbb{E}[X]\big)^2\big]\Big)^{1/2}\Big(E\big[\big(X^{-1}-\mathbb{E}[X^{-1}]\big)^2\big]\Big)^{1/2}$$

Since $0<a\leq X\leq b$, we also have that $0<\frac1b\leq\frac{1}{X}\leq \frac1a$. Recall that the mean $E[Z]$ of any square integrable random variable $Z$ satisfies $E\big[(Z-\mathbb{E}[Z])^2\big]=\inf_{a\in\mathbb{R}}E\big[(Z-a)^2\big]$. In particular, \begin{align} \mathbb{E}\big[\big(X-\mathbb{E}[X]\big)^2\big]&\leq \mathbb{E}\big[\big(X-\frac{a+b}{2}\big)^2\big]\leq\Big(\frac{b-a}{2}\Big)^2\\ E\big[\big(X^{-1}-\mathbb{E}[X^{-1}]\big)^2\big]&\leq \mathbb{E}\big[\big(X^{-1}-\frac{a^{-1}+b^{-1}}{2}\big)^2\big]\leq\Big(\frac{b-a}{2ab}\Big)^2 \end{align} Putting things together, we obtain $$ \mathbb{E}[X]\mathbb{E}[X^{-1}] \leq 1+\frac{b-a}{2}\frac{b-a}{2ab}=\frac{(a+b)^2}{4ab} $$

The conditions for equality in the theorem follow by considering the cases where equality occur in Jensen's inequality and in Cauchy-Schwartz inequality. I leave that for anybody interested in verifying the conditions.


The inequality in the OP is a particular case of the Theorem outlined above. First notice that since $A$ is positive definite matrix, we may assume, without loss of generality, that $A$ is a diagonal matrix with positive entries $m=\lambda_1\leq\ldots\leq \lambda_n=M$. Furthermore, it is enough to consider $x$ with $\|x\|^2_2=1=\sum^n_{k=1}|x_k|^2$. Define $\Omega=\{1,\ldots,k\}$, $\mathscr{F}$ to collection of al subsets of $\Omega$, and $\mathbb{P}[\{k\}]=|x_k|^2$, and $X(k)=\lambda_k$.


A survey with other interesting extensions can be found in Bühler, W. Two proofs of the Kantorovich Inequality and generalizations, Revista Colombiana de Matemáticas, Vol. 21 (1987), pp.147-154.