Proving Holder's inequality for Schatten norms

Sticking to the finite dimensional case, Holder's inequality for Schatten norms is given by

$$\left\|AB\right\|_{S^1}\leq\left\|A\right\|_{S^p}\left\|B\right\|_{S^q}$$

for $A,B$ $n\times n$ matrices, $p,q\in[1,\infty]$, and $\frac{1}{p}+\frac{1}{q}=1$.

So using Young's inequality, the expression I have in mind is the following \begin{align} \frac{\left\|AB\right\|_{S^1}}{\left\|A\right\|_{S^p}\left\|B\right\|_{S^q}}=\frac{1}{\left\|A\right\|_{S^p}\left\|B\right\|_{S^q}}\sum_{i=1}^n|\sigma_i(AB)|&\overset{?}{\leq}\frac{1}{\left\|A\right\|_{S^p}\left\|B\right\|_{S^q}}\sum_{i=1}^n|\sigma_i(A)||\sigma_{\pi(i)}(B)|\\ &\overset{\text{YI}}{\leq}\frac{1}{p\left\|A\right\|_{S^p}^p}\sum_{i=1}^n|\sigma_i(A)|^p + \frac{1}{q\left\|A\right\|_{S^q}^q}\sum_{i=1}^n|\sigma_{\pi(i)}(B)|^q\\ &=\frac{1}{p} + \frac{1}{q}\\ &=1 \end{align}

Focusing in on the inequality under question, $$\sum_{i=1}^n|\sigma_i(AB)|\overset{?}{\leq}\sum_{i=1}^n|\sigma_i(A)||\sigma_{\pi(i)}(B)|$$ we see that proving the Schatten version of Holder's inequality boils down to proving that there exists a permutation $\pi$ of the indices $\{1,...,n\}$ such that the above inequality holds. Of course maybe this isn't true, but it's the hurdle I ran into when trying to adapt the standard proof of Holder's inequality to the Schatten case.

Also I don't strictly need all the absolute values since singular values are always non-negative, but originally I was considering only Hermitian matrices, so I decided to include them.

Edit:

After doing some numerical tests it looks like permuting the indices isn't necessary. Thus to win the bounty I'm either looking for a proof that $$\sum_{i=1}^n\sigma_i(AB)\leq\sum_{i=1}^n\sigma_i(A)\sigma_i(B)$$ or if you have some other proof of the Schatten Holder's inequality different than the one I've tried to adapt above, then that's fine too.


We let $p=rank(AB)$, since $p \leq min(rank(A),rank(B))$, it suffices to prove that $$\sum_{i=1}^p\sigma_i(AB) \leq \sum_{i=1}^p\sigma_i(A)\sigma_i(B).$$

Consider the SVD decomposition of AB, $$AB=\left[\begin{array}{cc}U & \widetilde{U} \end{array}\right]\left[\begin{array}{cc} \Sigma & 0 \\ 0 & 0 \end{array}\right]\left[\begin{array}{c} V^T \\ \widetilde{V}^T \end{array}\right]=U\Sigma V^T$$ where $\Sigma=diag(\sigma_1(AB),\ldots,\sigma_p(AB)).$ For any matrix $P$, we let $P_k$ denotes the submatrix consisting of the first $k$ columns of $P$. We fix a $k \in \left\{ 1,\ldots,p\right\},$ $$U_k^T(AB)V_k=diag(\sigma_1(AB),\ldots,\sigma_k(AB)).$$ Next, we consider the SVD of $U_k^TA,$ $$U_k^TA=R \left[\begin{array}{cc}S & 0 \end{array}\right]\left[\begin{array}{c} W^T\\ \widetilde{W}^T\end{array} \right]=RSW^T.$$ We have $$U_k^TAA^TU_k=RS^2R^T$$ and hence we can see that $$\det(RS^2R^T)\leq\prod_{i=1}^k \sigma_i(A)^2$$ and as a result, $$\det(RSR^T) \leq \prod_{i=1}^k \sigma_i(A).$$

\begin{align*} \prod_{i=1}^k \sigma_i(AB) &=\det(U_k^TABV_k)\\ &= \det(RSW^TBV_k) \\ &= \det(RSR^T)det(RW^TBV_k)\\ &\leq \prod_{i=1}^k \sigma_i(A)\sigma_i(B). \end{align*}

Taking logarithm on both sides, we have $\forall k \in \left\{1, \ldots, p\right\},$

$$\sum_{i=1}^k \log(\sigma_i(AB)) \leq \sum_{i=1}^k \log(\sigma_i(A)\sigma_i(B))$$

I will use a trick called majorization to remove the logarithms. I am going to construct 2 vectors $a,b \in \mathbb{R}^{p+1}$ such that $a \succ b.$ The vectors satisfy the following conditions: \begin{align*} \text{1. }& a_1 \geq \ldots \geq a_{p+1},\\ \text{2. }& b_1 \geq \ldots \geq b_{p+1}, \\ \text{3. }& \sum_{i=1}^k b_i \leq \sum_{i=1}^k a_i, \forall k \in \left\{1,\ldots,p\right\}, \\ \text{4. }& \sum_{i=1}^{p+1} b_i = \sum_{i=1}^{p+1} a_i. \end{align*}

Under the above conditions, $\sum_{i=1}^{p+1} \exp(b_i) \leq \sum_{i=1}^{p+1} \exp(a_i)$ since the exponential function is convex.

We let

$$b=\left(\log(\sigma_1(AB),\ldots,\log(\sigma_p(AB)),\min(a_p,b_p)\right),$$

$$a=\left(\log(\sigma_1(A)\sigma_1(B),\ldots,\log(\sigma_p(A)\sigma_p(B)),\sum_{i=1}^{p+1}b_i-\sum_{i=1}^p a_i\right).$$

To verify the $\textbf{Condition 1}$, I need to show that $a_p \geq \sum_{i=1}^{p+1}b_i-\sum_{i=1}^p a_i$ which is equivalent to $$a_p-\sum_{i=1}^{p+1}b_i+\sum_{i=1}^p a_i=(a_p-b_{p+1})+\left(\sum_{i=1}^pa_i-\sum_{i=1}^p b_i\right)\geq 0.$$

$b_{p+1} \leq a_p$ due to the definition of $b_{p+1}$ and we have proven earlier that $\left(\sum_{i=1}^pa_i-\sum_{i=1}^p b_i\right) \geq 0$.

To verify the $\textbf{Condition 2}$, observe that $b_p \geq b_{p+1}$ due to definition of $b_{p+1}$ again.

$\textbf{Condition 3}$ was proven earlier.

To check $\textbf{Condition 4}$,

$$\sum_{i=1}^{p+1}a_i=\sum_{i=1}^{p}a_i+a_{p+1}=\sum_{i=1}^{p+1}b_i.$$

As a result, we have

$$\sum_{i=1}^{p+1} \exp(b_i) \leq \sum_{i=1}^{p+1} \exp(a_i)$$

which is equivalent to $$\sum_{i=1}^{p} \exp(b_i) \leq \sum_{i=1}^{p} \exp(a_i)+\exp(a_{p+1})-\exp(b_{p+1}).$$ Thus, we have \begin{align*} \sum_{i=1}^{p} \sigma_i(AB) &\leq \sum_{i=1}^{p} \sigma_i(A)\sigma_i(B)+\exp(a_{p+1})-\exp(b_{p+1})\\ &=\sum_{i=1}^{p} \sigma_i(A)\sigma_i(B)+\exp\left(\sum_{i=1}^p(b_i-a_i)+b_{p+1}\right)-\exp(b_{p+1})\\ &=\sum_{i=1}^{p} \sigma_i(A)\sigma_i(B)+\exp(b_{p+1})\left(\exp \left(\sum_{i=1}^p(b_i-a_i)\right)-1)\right)\\ &\leq \sum_{i=1}^{p} \sigma_i(A)\sigma_i(B)+\exp(b_{p+1})\left(\exp \left(0\right)-1)\right)\\ &=\sum_{i=1}^{p} \sigma_i(A)\sigma_i(B) \end{align*}

Hence we are done.

A bonus that I learn from answering the question is if the following conditions hold:

\begin{align*} \text{A. }& a_1 \geq \ldots \geq a_{p},\\ \text{B. }& b_1 \geq \ldots \geq b_{p}, \\ \text{C. }& \sum_{i=1}^k b_i \leq \sum_{i=1}^k a_i, \forall k \in \left\{1,\ldots,p\right\}, \\ \end{align*}

Then we have $$\sum_{i=1}^k \exp(b_i) \leq \sum_{i=1}^k \exp(a_i), \forall k \in \left\{1,\ldots,p\right\}.$$


An alternative proof which is based on more standard technology is as follows. First observe that for any $A$ and $B$, there exists a unitary matrix $U$ such that $$ ||AB||_1=|\mathrm{Tr}((AU)^\dagger B)|. $$ Indeed consider the singular value decomposition $$AB=\sum_i \sigma_i(AB)a_i(AB)b_i(AB)^\dagger,$$ where the singular values $\sigma_i(AB)$ are positive and listed in decreasing order and $a_i$ and $b_i$ are orthonormal sets of vectors. We may then take $U$ to be any unitary such that $U^\dagger a_i(AB)=b_i(AB)$ for all $i$. Therefore, since $||AU||_p=||A||_p$ for any unitary $U$, the result will be proved if we can show that $$ |\mathrm{Tr}(A^\dagger B)|\leq||A||_p||B||_q $$ for all $A$ and $B$ (this inequality is also referred to as Holder's inequality for Schatten norms). The proof of this inequality follows first by applying von Neumann's trace inequality, which says that $$|\mathrm{Tr}(A^\dagger B)|\leq\sum_i\sigma_i(A)\sigma_i(B),$$ with the singular values again listed in decreasing order, and then the classical Holder's inequality for $L_p$ spaces, which says that for two complex vectors $n_i$ and $m_i$ we have $$\sum_i|n_i||m_i|\leq (\sum_i |n_i|^p)^{1/p}(\sum_i|m_i|^q)^{1/q}.$$ The classical Holder inequality is proven using Young's inequality $$ab\leq a^p/p+b^q/q,$$ which holds for all $a,b\geq 0$ and $p\geq 1$, $1/p+1/q=1$. The proof of von Neumann's inequality is more involved, one relatively accessible proof by Mirsky based on doubly stochastic matrices (which has some relation to the proof given in the other answer here) is given in: https://link.springer.com/article/10.1007/BF01647331.