Solve least-squares minimization from overdetermined system with orthonormal constraint

I would like to find the rectangular matrix $X \in \mathbb{R}^{n \times k}$ that solves the following minimization problem:

$$ \mathop{\text{minimize }}_{X \in \mathbb{R}^{n \times k}} \left\| A X - B \right\|_F^2 \quad \text{ subject to } X^T X = I_k $$

where $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{m \times k}$ are given. This appears to be a form of the orthogonal Procrustes problem, but I'm getting tripped up in my case when $X$ is not square and $n \gg k$ and $m > n$.

Optimistically, I'm looking for a solution that would involve singular value decomposition of a small $k \times k$ matrix, but I'm not seeing it. I'm especially interested in the case when $$A = \left(\begin{array}{c} D_1 \\ D_2\end{array}\right) \in \mathbb{R}^{2n \times n}$$ and $D_1,D_2$ are rank-sufficient diagonal matrices. This is to say that a solution involving $D_1^{-1}$ and $D_2^{-1}$ would be acceptable. The closest I've come (using "Thin SVD" on $Y$) is:

$$ Y = (A^TA)^{-1}(A^T B) \\ Y = U \Sigma V^T \\ X = UV^T $$

clearly $X^T X = I_k$, but

  1. I haven't convinced myself that this is the minimizer,
  2. this involves inverting a potentially huge $n \times n$ matrix (perhaps unavoidable and not so bad in the stacked diagonal case above where $(A^TA)^{-1} = (D_1^2 + D_2^2)^{-1}$, and
  3. this involves s.v.d. of a large rectangular $n \times k$ matrix.

Is this correct and as good as it gets? Or, is there a more efficient solution?


Your proposed solution is not correct. Let's consider the simplest case: $m=n$, $k=1$, and $A$ is invertible. Then our problem is $$\min_{x\in\mathbb R^n} \|Ax-b\|^2\quad\text{s.t.}\quad \|x\|^2=1.$$ The set $\{x:\|x\|^2=1\}$ is the unit sphere, so the transformed set $\{Ax:\|x\|^2=1\}$ is an ellipsoid, and we want to find the point $Ax$ on this ellipsoid closest to $b\in\mathbb R^n$.

Your proposed solution reduces to $y = A^{-1}b$ and $x = y/\|y\|$. Then $Ax = b/\|A^{-1}b\|$, that is, your proposed closest point is obtained by simply scaling $b$ to lie on the ellipsoid. It should be clear that in general this is not the closest point to $b$.

Sorry, I don't have a good answer for how to find the correct solution.


We have the following optimization problem in tall matrix $\mathrm X \in \mathbb R^{n \times k}$

$$\begin{array}{ll} \text{minimize} & \| \mathrm A \mathrm X - \mathrm B \|_{\text{F}}^2\\ \text{subject to} & \mathrm X^\top \mathrm X = \mathrm I_k\end{array}$$

where tall matrices $\mathrm A \in \mathbb R^{m \times n}$ and $\mathrm B \in \mathbb R^{m \times k}$ are given. Let the Lagrangian be

$$\mathcal L (\mathrm X, \Lambda) := \frac 12 \| \mathrm A \mathrm X - \mathrm B \|_{\text{F}}^2 + \frac 12 \langle \Lambda , \mathrm X^\top \mathrm X - \mathrm I_k \rangle$$

Taking the partial derivatives and finding where they vanish, we obtain two matrix equations

$$\begin{array}{rl} \mathrm A^\top \mathrm A \, \mathrm X + \mathrm X \left(\dfrac{\Lambda + \Lambda^\top}{2}\right) &= \mathrm A^\top \mathrm B\\ \mathrm X^\top \mathrm X &= \mathrm I_k \end{array}$$

Left-multiplying both sides of the first matrix equation by $\mathrm X^\top$ and using $\mathrm X^\top \mathrm X = \mathrm I_k$, we obtain

$$\mathrm X^\top \mathrm A^\top \mathrm A \, \mathrm X + \left(\dfrac{\Lambda + \Lambda^\top}{2}\right) = \mathrm X^\top \mathrm A^\top \mathrm B$$

Using an argument similar to the one used by Peter Schönemann in his 1966 paper, note that the left-hand side of the matrix equation above is the addition of two symmetric matrices. Thus, the right-hand side must also be symmetric, which produces the following linear matrix equation

$$\mathrm X^\top \mathrm A^\top \mathrm B = \mathrm B^\top \mathrm A \, \mathrm X$$

If $\rm X$ were square and orthogonal, then we could use Schönemann's approach to solve the linear matrix equation above. Unfortunately, $\rm X$ is tall and only semi-orthogonal. We have $k^2$ linear equations in $n k$ unknowns. Hence, we have at least $n k - k^2 = k (n-k)$ degrees of freedom.

To summarize, we have $k^2$ linear equations and $k^2$ quadratic equations in the $n k$ entries of $\rm X$

$$\begin{array}{rl} \mathrm X^\top \mathrm A^\top \mathrm B - \mathrm B^\top \mathrm A \, \mathrm X &= \mathrm O_k\\ \mathrm X^\top \mathrm X &= \mathrm I_k\end{array}$$

Unfortunately, it is not obvious to me how to solve these equations.