Distance of a matrix from the orthogonal group

Let $\lVert A\rVert = \left(\sum_{i,j=1}^n \left\| a_{ij} \right\|^2\right)^{\frac{1}{2}} = \sqrt{\operatorname{tr}(A^\top A)}$ be the Frobenius norm on $n \times n$ matrices.

Fix $A \in GL_n(\mathbb{R})$.

1) Is there a formula for the $dist(A,O(n))$?

where $dist^2(A,O(n)) =\underset{X \in O(n)}{\text{min}} \|A - X\|^2$

(O(n) is the orthogonal group , i.e matrices satisfying $X^TX=I_d$, The minimum exists since $O(n)$ is compact)

Note: In general we don't always have a unique minimizer (i.e there can be more than one orthogonal matrice which is closest to $A$ in $o(n)$), at least if we consider non-invertible matrices $A$.

For example, $A=0$ is in the same distance from each element of $o(n)$ since the Frobenius norm of any isometry is $\sqrt n$.

Question: Can we prove the minimizer is unique (if $A \in GL_n(\mathbb{R})$)? If so, is the function: $GL_n(\mathbb{R}) \to \mathbb{R}^{n^2} \, , \, A \to X(A)$ where $X(A)$ is the minimizer, smooth? Can we provide an explicit formula for it?


I have tried using Lagrange's multipliers, but so far with no success. (I couldn't determine if the gradients of all the $n^2$ constraints are always linearly independent.

Remark: \begin{align} \|A - X\|^2 &= \mathrm{tr} \left( (A-X)^t (A-X) \right) \\\ &= \mathrm{tr} \left( (A^t -X^t) (A-X) \right) \\\ &= \mathrm{tr} (A^tA -A^tX - X^tA + X^tX) \\\ &= \mathrm{tr} (A^tA) - \mathrm{tr}(A^tX) - \mathrm{tr}((A^tX)^t) + \mathrm{tr} (I_d)\\\ &= n + \mathrm{tr} (A^tA) - 2\mathrm{tr}(A^tX) \end{align}

so minimizing $\|A - X\|$ is equivalent to maximizing $\mathrm{tr}(A^tX)$. (In particular, the objective function to optimize is linear and not quadratic)


Solution 1:

I have a guess as to the minimizer. Perhaps there's a neat proof that this happens to be the right answer.

Note that $A$ has a singular value decomposition $$ A = U\Sigma V^T $$ where $\Sigma$ is in this case a square, diagonal matrix whose diagonal elements are the (strictly positive) singular values of $A$, and $U,V$ are orthogonal matrices.

My suspicion is that $$ dist(\Sigma,O(n)) = \|\Sigma - I\| $$ where $I$ is the identity matrix. By the orthogonal invariance of the Frobenius norm, we can equivalently state that $$ dist(A,O(n)) = \|A - UV^T\| = \sqrt{\sum_{j=1}^n [s_j(A)-1]^2} $$ If $UV^T$ does turn out to be a minimizer in general, there is no reason (at first glance) to believe that this minimizer is unique.

There is a condition that would guarantee a unique minimizer that I suspect is true. In particular, I think that $SO(n)$ is the boundary of its own convex hull. If that is the case, then the minimizer of the distance from any $X$ outside of this convex hull is unique.

Note that uniqueness does not hold for elements on the interior of the convex hull. In particular, every element of $O(n)$ is at the minimum distance from $X = 0$. I suspect it is this condition (rather than invertibility) that "causes" uniqueness to fail.


Edit: see levap's answer, which looks correct to me. It seems that a unique minimizer indeed exists if and only if $A$ is invertible.

Solution 2:

Let us prove Omnomnomnom's suspicion. Fix $A \in M_n(\mathbb{R})$. Following the OP's observation, we want to maximize the linear functional $\varphi \colon M_n(\mathbb{R}) \rightarrow \mathbb{R}$ given by $\varphi(X) = \mathrm{tr}(A^tX)$ subject to the constraint $X^t X = I$. First, let us assume that $A$ is a diagonal matrix with non-negative entries on the diagonal. Then, given an orthogonal $X$ we have

$$ \varphi(X) = \mathrm{tr}(A^tX) = \sum_{i = 1}^n a_{ii} x_{ii} \leq \sum_{i=1}^n a_{ii} = \mathrm{tr}(A^t) = \varphi(I) $$

(because $-1 \leq x_{ij} \leq 1$ for all $1 \leq i, j \leq n$). Thus, the minimizer in this case is just $X = I$. Note that if $a_{ii} > 0$ for all $1 \leq i \leq n$ then the proof shows that $X = I$ is the unique maximizer and if some $a_{ii} = 0$ then clearly there can be infinitely many maximizers.

Now, given an arbitrary $A$, use SVD decomposition to write $A = U \Sigma V^t$. Then, $A^t = V \Sigma U^t$ and given an orthogonal matrix $X$ we have

$$ \varphi(X) = \mathrm{tr}(V \Sigma U^t X) = \mathrm{tr}(V^t (V \Sigma U^t X) V) = \mathrm{tr}(\Sigma U^t X V) = \mathrm{tr}(\Sigma W(X)) $$

where $W(X) = U^t X V$ is also an orthogonal matrix. When $X$ runs over all orthogonal matrices, so does $W(X)$ and so we have reduced the problem to the previous case and $X$ such that $W(X) = U^t X V = I$ is a maximizer. Solving for $X$, we see that $X = UV^t$. If $A$ is invertible, the matrix $\Sigma$ has strictly positive entries on the diagonal and so by the previous case the maximizer is in fact unique. This does look surprising and this implies that while the SVD decomposition is highly non-unique in general, the product $UV^t$ should be unique, at least when $A$ is invertible. It would be interesting to check this directly.

Regarding your question about a formula, you can also decompose $A$ using polar decomposition as $A = OP$ where $O$ is orthogonal and $P$ is non-negative. By orthogonally diagonalizing $P$ with $P = UDU^t$, you obtain an $SVD$ decomposition $A = OUDU^t$ for $A$ showing that $O$ is a minimizer for the distance. If $A$ is invertible, then $O$ is determined uniquely and is given by $O = A(\sqrt{A^t A})^{-1}$ where $\sqrt{A^tA}$ is the unique positive root of $A^t A$. However, as $\sqrt{A^t A}$ doesn't really have an "explicit" formula, I'm not sure how much this helps.