Maximizing the trace in an elegant way

Suppose $\Lambda = {\rm diag}(\lambda_1,\cdots,\lambda_n)$, $\Sigma = {\rm diag}(\sigma_1,\cdots,\sigma_n)$, and we have $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0$ and $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$. I want to prove that $${\rm max}[ {\rm Tr}(O^{\rm T}\Lambda O \Sigma)] = \sum_{i=1}^n \lambda_i \sigma_i,$$ where $OO^{\rm T} = I$. One approach is to maximize the function $$f(o_{ij}) = \sum_{i,j} \lambda_i (o_{ij})^2 \sigma_j$$ under the constraint $\sum_{k} o_{ik}o_{jk} = \delta_{ij}$ by Lagrange multiplier method. I can prove the statement in this way but the whole proof is lengthy and lack of the flavor of linear algebra. Since the statement "must be true" intuitively, I want to know if there is an elegant way to prove it. A possible way may be mathematical induction but I failed to make it. Any help is appreciated.


(Rewritten from the last part of my answer to Trace minimization with constraints .)

Since the objective function is continuous and the set of all real orthogonal matrices is compact, the maximum trace is continuous in the entries of $\Sigma$. Therefore, by passing to limit, we may assume without loss of generality that $\Sigma$ has distinct diagonal entries.

The matrix exponential of every skew-symmetric matrix is a real orthogonal matrix of determinant one. If $O$ is a global maximiser in your problem, its objective function value must be greater than or equal to the objective function value evaluated at $Oe^{tK}$ (which is real orthogonal) for any real number $t$ and any skew-symmetric $K$. So, if we define $f(t)=\operatorname{tr}(e^{-tK}O^T\Lambda Oe^{tK}\Sigma)$, we must have $f'(0)=0$. Using the cyclic property $\operatorname{tr}(XY)=\operatorname{tr}(YX)$, we get $$ 0=f'(0)=\operatorname{tr}\left((\Sigma O^T\Lambda O-O^T\Lambda O\Sigma)\, K\right). $$ Put $K^T=\Sigma O^T\Lambda O-O^T\Lambda O\Sigma$ (which is indeed skew-symmetric), we get $\operatorname{tr}(K^TK)=0$. Hence $K=0$, meaning that $O^T\Lambda O$ commutes with $\Sigma$. Any matrix that commutes with a diagonal matrix with distinct diagonal entries must have all off-diagonal entries equal to zero. Therefore $O^T\Lambda O$ is a diagonal matrix.

Hence $\Lambda_1:=O^T\Lambda O$ is a diagonal permutation of $\Lambda$, because $\Lambda_1$ and $\Lambda$ are diagonal matrices with the same eigenvalues. Now the problem boils down to finding the diagonal permutation $\Lambda_1$ of $\Lambda$ that maximises $\operatorname{tr}(\Lambda_1\Sigma)$. Obviously, maximum occurs when $\Lambda_1=\Lambda$ or when $O=I$.


This is not an answer.

My contribution is here to show that the issue is equivalent to a minimization of a certain difference (in the sense of Frobenius norm) still over all orthogonal matrices in a seemingly more general context. I would have like that this equivalent permits to establish the result, but I haven't succeeded.

$$\begin{eqnarray}\tag{*}M&=&\min_{O \in O(n)}\|O^{\rm T}\Lambda O - \Sigma \|_F^2\\&=&\min_{O \in O(n)}{\rm Tr}[(O^{\rm T}\Lambda O - \Sigma)^T(O^{\rm T}\Lambda O - \Sigma)]\\&=&\min_{O \in O(n)}{\rm Tr}[\Lambda^2+\Sigma^2-(O^{\rm T}\Lambda O \Sigma)-(\Sigma O^{\rm T}\Lambda O)]\\&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-{\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]-{\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]\right\}\\\tag{1}&=&\min_{O \in O(n)}\left\{{\rm Tr}[\Lambda^2+\Sigma^2]-2 \ {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]\right\}\end{eqnarray}$$

Explanation: ${\rm Tr}[(\Sigma O^{\rm T}\Lambda O)]= {\rm Tr}[(O^{\rm T}\Lambda O \Sigma)]$ because of the following "circulant property" of trace operator:

$$ {\rm Tr}[ABCD]= {\rm Tr}[BCDA]$$

Thus, because of the minus sign in (1), and the fact that ${\rm Tr}[\Lambda^2+\Sigma^2]$ is a fixed quantity, the minimum in (1) is indeed attained for an orthogonal matrix $O$ such that

$${\rm Tr}(O^{\rm T}\Lambda O \Sigma) \ \ \text{is maximal}$$

Thus the equivalence betwen (*) and our initial problem is established.