Closed Form Solution of $ \arg \min_{x} {\left\| x - y \right\|}_{2}^{2} + \lambda {\left\|x \right\|}_{2} $ - Tikhonov Regularized Least Squares

The problem is given by:

$$ \arg \min_{x} \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \lambda {\left\|x \right\|}_{2} $$

Where $y$ and $x$ are vectors. $\|\cdot\|_2$ is Euclidean norm. In the paper Convex Sparse Matrix Factorizations, they say the closed form solution is $x=\max\{y-\lambda \frac{y}{\|y\|_2}, 0\}$. I don't know why $x$ need to be non-negative. I think it may come from $\|x\|_2=\sqrt{x^Tx}$. But I cannot derive it. Please help.

The statement appears in the last paragraph line 2 on page 5 of the Paper.


That's not what the referenced paper says. It gives an expression which is equivalent to the proximal operator of the $\ell_2$ norm:

$$ \DeclareMathOperator*{\argmin}{arg\,min} \argmin_x \frac{1}{2}\|x-y\|^2 + \lambda\|x\| = \max(\|y\|-\lambda,0)\frac{y}{\|y\|} $$ Note the vector $y$ is not inside the maximum.

I'll sketch a proof. We can decompose $x$ as sum of two components, one parallel to $y$ and one orthogonal to $y$. That is, let $ x = t \frac{y}{ \| y\| } + z $ where $y^T z=0$. Then the objective reduces to:

$$\frac{1}{2}\|x-y\|^2 + \lambda\|x\| = \frac{1}{2}\|z\|^2 + \frac{1}{2}(t-\|y\|)^2 + \lambda \sqrt{t^2 + \|z\|^2}$$ Clearly the expression is minimized when $z=0$, so the problem reduces to a 1-dimensional problem: $$ \min_t \frac{1}{2}(t-\|y\|)^2 + \lambda |t| $$ Then it's a basic exercise in calculus to show that the objective is minimized when $t=\max(\|y\|-\lambda,0)$.


One could see that the Support Function of the Unit Ball of $ {\ell}_{2} $ is given by:

$$ {\sigma}_{C} \left( x \right) = {\left\| x \right\|}_{2}, \; C = {B}_{{\left\| \cdot \right\|}_{2}} \left[0, 1\right] $$

The Fenchel's Dual Function of $ {\sigma}_{C} \left( x \right) $ is given by the Indicator Function:

$$ {\sigma}_{C}^{\ast} \left( x \right) = {\delta}_{C} \left( x \right) $$

Now, using Moreau Decomposition (Someone needs to create a Wikipedia page for that) $ x = \operatorname{Prox}_{\lambda f \left( \cdot \right)} \left( x \right) + \lambda \operatorname{Prox}_{ \frac{{f}^{\ast} \left( \cdot \right)}{\lambda} } \left( \frac{x}{\lambda} \right) $ one could see that:

$$ \operatorname{Prox}_{\lambda {\left\| \cdot \right\|}_{2}} \left( x \right) = \operatorname{Prox}_{\lambda {\sigma}_{C} \left( \cdot \right)} \left( x \right) = x - \lambda \operatorname{Prox}_{ \frac{{\delta}_{C} \left( \cdot \right)}{\lambda} } \left( \frac{x}{\lambda} \right) $$

It is known that $ \operatorname{Prox}_{ {\delta}_{C} \left( \cdot \right) } = \operatorname{Proj}_{C} \left( x \right) $, namely the Orthogonal Projection onto the set.

In the case above, of $ C = {B}_{{\left\| \cdot \right\|}_{2}} \left[0, 1\right] $ it is given by:

$$ \operatorname{Proj}_{C} \left( x \right) = \frac{x}{\max \left( \left\| x \right\|, 1 \right)} $$

Which yields:

$$ \begin{align} \operatorname{Prox}_{\lambda {\left\| \cdot \right\|}_{2}} \left( x \right) & = \operatorname{Prox}_{\lambda {\sigma}_{C} \left( \cdot \right)} \left( x \right) = x - \lambda \operatorname{Prox}_{ \frac{{\delta}_{C} \left( \cdot \right)}{\lambda} } \left( \frac{x}{\lambda} \right) \\ & = x - \lambda \operatorname{Prox}_{ {\delta}_{C} \left( \cdot \right) } \left( \frac{x}{\lambda} \right) \\ & = x - \lambda \operatorname{Proj}_{C} \left( \frac{x}{\lambda} \right) \\ & = x - \lambda \frac{x / \lambda}{ \max \left( {\left\| \frac{x}{\lambda} \right\|}_{2} , 1 \right) } = x \left( 1 - \frac{\lambda}{\max \left( {\left\| x \right\|}_{2} , \lambda \right)} \right) \end{align} $$