Trace minimization with constraints
For positive semi-definite matrices $A,B$, how can I find an $X$ that minimizes $\text{Trace}(AX^TBX$) under 'either' one of these constraints:
a) Sum of squares of Euclidean-distances between pairs of rows in $X$ is a constant $\nu$.
or
b) $X$ is orthogonal.
Not a hw problem- though the style of writing might suggest so. All the matrices have real entries and $A,B$ are square while $X$ is rectangular. Thanks.
This is what I have:
Define $B=F^{T}F$. Define $Y=FX$. You get the above problem as \begin{align} \min_{Y}~ \text{trace}(AY^{T}Y) \end{align}
But now I want $X^*$ that minimizes the original problem. This is what is confusing me!
Solution 1:
If you want to solve the minimization problem individually in each of the two constrained cases, then (b) can be reduced to a long solved problem. [Edit: by "(b) $X$ is orthogonal", I suppose you mean $X$ has orthonormal columns. If the columns of $X$ are merely required to be orthogonal instead of orthonormal, then the minimum trace is clearly $0$, which is attained when $X=0$.]
Suppose $A$ is $m\times m$ and $B$ is $n\times n$ (hence $X$ is $n\times m$). We may assume that $A,B$ and $X$ have the same size, because if $X$ is "wide" ($n<m$), we can transform the problem to the form $\min\mathrm{tr}(AQ^T\tilde{B}Q)$ subject to $QQ^T=I$, where $A,\tilde{B}$ and $Q$ have the same sizes: \begin{align} AX^TBX &= A\underbrace{[X^T\ Y^T]}_{Q^T} \underbrace{\begin{bmatrix}B\\&0_{(m-n)\times(m-n)}\end{bmatrix}}_{\tilde{B}} \underbrace{\begin{bmatrix}X\\ Y\end{bmatrix}}_{Q} = AQ^T\tilde{B}Q. \end{align} The solution $X$ to the original problem will then be a submatrix of the solution $Q$ to the transformed problem, or more specifically, $$X=(I_n,\,0_{n\times(m-n)})Q.$$ A similar transformation can be performed if $X$ is "tall" and the size of $B$ is larger than the size of $A$.
Next, as $A$ and $B$ are positive semidefinite, we may orthogonally diagonalize them to their eigenvalue matrices. So, let $A=U\Lambda U^T$ and $B=V\Sigma V^T$, where the eigenvalues (i.e diagonal entries) of $\Lambda$ are arranged in ascending order and those of $B$ are arranged in descending order: \begin{align} \Lambda&=\mathrm{diag}(\lambda_1,\ldots,\lambda_n):=\mathrm{diag}(\lambda^\uparrow_1(A),\ldots,\lambda^\uparrow_n(A)),\\ \Sigma&=\mathrm{diag}(\sigma_1,\ldots,\sigma_n):=\mathrm{diag}(\lambda^\downarrow_1(B),\ldots,\lambda^\downarrow_n(B)). \end{align} Put $\tilde{Q}=V^TQU$, we get $$\min\mathrm{tr}(AQ^T\tilde{B}Q)=\min\mathrm{tr}(\Lambda\tilde{Q}^T\tilde{\Sigma}\tilde{Q}).$$ In other words, by absorbing $U$ and $V$ into $Q$, we may further assume that $A$ and $B$ are nonnegative diagonal matrices.
We have transformed our problem into a nicer form. It is now time to solve it. There are two more popular approaches to solve this problem. One looks more elegant and the other has a wider applicability.
The more elegant approach makes use of Birkhoff's Theorem. First, let $\tilde{Q}=(q_{ij})$. Then $\tilde{Q}\circ \tilde{Q}=(q_{ij}^2)$ is a doubly stochastic matrix. Therefore \begin{align} \min_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q}) &=\min_{\tilde{Q}\tilde{Q}^T=I} \sum_{i,j}\sigma_i\lambda_jq_{ij}^2\\ &\ge\min_S \sum_{i,j}\sigma_i\lambda_js_{ij};\ S=(s_{ij}) \textrm{ is doubly stochastic}\,\ldots(\ast). \end{align} Observe that $\sum_{i,j}\sigma_i\lambda_js_{ij}$ is a linear function in $S$ and the set of all doubly stochastic matrices is compact and convex. Hence $\min\limits_S \sum_{i,j}\sigma_i\lambda_js_{ij}$ must occur at an extreme point of this convex set. However, by Birkhoff's Theorem, the extreme points of this convex set are exactly all the permutation matrices. And a permutation matrix, of course, is real orthogonal. Therefore equality holds in $(\ast)$ above and $\min\limits_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q})$ is the minimum of $\mathrm{tr}(\Lambda P^T\Sigma P)$ over all permutation matrices $P$. As the diagonal entries of $\Lambda$ and $\Sigma$ are nonnegative and arranged in opposite orders, the global minimum occurs at $\tilde{Q}=P=I$, or $Q=VU^T$, which translates to $X=(I_n,\,0_{n\times(m-n)})VU^T$ when $X$ is wide.
The seemingly less elegant approach makes use of calculus. The major steps are as follows:
- Set first derivative of $\mathrm{tr}\ \Lambda \tilde{Q}^T \Sigma \tilde{Q}$ (with respect to $\tilde{Q}$, subject to $\tilde{Q}^T\tilde{Q}=I$) to zero gives $\mathrm{tr}(-\Lambda \tilde{Q}^TK\Sigma \tilde{Q} + \Lambda \tilde{Q}^T \Sigma K\tilde{Q})=0$ for all skew-symmetric matrix $K$. Hence $M=-\Sigma \tilde{Q}\Lambda \tilde{Q}^T + \tilde{Q}\Lambda \tilde{Q}^T \Sigma$ is a symmetric matrix.
- By definition, however, $M$ is skew-symmetric. Hence $M=0$, i.e. $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric.
- Now, suppose the $2n$ diagonal entries in $\Lambda$ and $\Sigma$ are all nonzero and distinct. Since $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric, $\tilde{Q}$ must be a permutation matrix. Hence we obtain the same result as in the previous approach. When the entries of $\Lambda$ and $\Sigma$ are not all distinct, we can consider two sequences of nonnegative diagonal matrices $\{\Lambda_n\}$ and $\{\Sigma_n\}$ such that $\Lambda_n\rightarrow\Lambda,\ \Sigma_n\rightarrow\Sigma$ as $n\rightarrow\infty$, and for each $n$ the $2n$ diagonal entries of $\Lambda_n$ and $\Sigma_n$ are nonzero and distinct. So, we may arrive at our desired conclusion by a continuity argument.
This second approach requires more work (because we have no nice theorem to depend on), but I think the technique involved (matrix calculus) is applicable to more problems and hence more useful.
Solution 2:
As I read the question some papers I have recently studied came up to my mind as a quick recommendation for an orthogonal solution to the problem. In the papers "Dynamical systems that sort lists, diagonalize matrices, and solve linear programming problems" and "Least squares matching problems" a dynamical systems approach that finds a solution in orthogonal space is presented. However, in Brockett's papers A and B matrices should have distinct eigenvalues which affects the number of local minima and X is a square matrix. Still they may be inspiring.
Solution 3:
Although the question already has a detailed answer, I add this recent paper as an additional reference, as it precisely addresses the question asked: "On Generalizing Trace Minimization", Liang et al. 2021, https://arxiv.org/abs/2104.00257.