Rank, nuclear norm and Frobenius norm of a matrix.

The nuclear norm, denoted $\|\cdot\|_*$ is a good surrogate for the $rank$ when minimizing problems like $$\label{pb1}\tag{1} \min_X rank(X) : AX = B $$ Here, we're trying to find an matrix X with low rank such that $AX=B$. If I recall correctly, when the singular values of $X$ are bounded above by 1, you can replace $rank$ by $\|\cdot\|_*$ in the problem \eqref{pb1}.

What about the Frobenius norm ? Can the Frobenius norm be a good surrogate to the nuclear norm ? Under which assumptions ?

Since we have $\|X\|_* = \min_{X=UV^\top} \|U\|_F\|V\|_F$, in particular we have $\|XX^\top\|_* = \|X\|_F^2$.

Then, because $\|X\|_1\le 1$, we also have $\|XX^\top\|_1\le 1$.

So, this is pretty direct, no ? Frobenius norm is also a good surrogate ?


Solution 1:

In general the Frobenius and nuclear norm are not a good proxy for rank. The norms depend on the singular values $||X||_* = \sum_i \sigma_i(X)$ and $||X||_F^2 = \sum_i \sigma_i^2(X)$... so if the singular values of your matrix are all too small or too large, then you have a bad approximation to the rank.

However, you are right that when the singular values are clustered somewhere near one both norms can give a good approximation of the rank. Lets look at a bound...

If the $r$ nonzero singular values $\sigma_1,...,\sigma_r$ of a rank $r$ matrix $X$ are bounded by $|1-\sigma_i| \leq \epsilon/r$ for a error tolerance $0\leq\epsilon<r$, then $|r - ||X||_*| \leq \epsilon$. This bound tells us that as the your error in the approximation of the rank grows, the singular values can correspondingly grow linearly.

For the Frobenius norm: If the $r$ nonzero singular values $\sigma_1,...,\sigma_r$ of a rank $r$ matrix $X$ are bounded by $\sqrt{1-\epsilon/r}\leq\sigma_i\leq\sqrt{1+\epsilon/r}$, then $|r - ||X||_*| \leq \epsilon$. In this case if we increase our approximation error tolerance, our singular values can only grow according to this square root function.

In other words, these bounds show that regardless of norm choice you don't have a lot of wiggle room in your singular values if you want your norm to well approximate rank. However, for the nuclear norm you have more wiggle room than Frobenius.

A good rule of thumb is that the nuclear norm can achieve the same approximation to rank as the Frobenius rule with about twice as much freedom in singular values(interval twice as large), and that the singular values should always be near one.

In your minimization problem, it is hard to know how to constrain $X$ to ensure well approximation of the rank. Perhaps normalizing the spectral radius of $A$ and $B$ could help, or constraining $X$ with the spectral norm $||Ax||/||x||$.