$ {L}_{1} $ Regularized Unconstrained Optimization Problem

There is a neat geometric way to see why this should be so, but I've forgotten some of the details. Some papers of Daubeches, Dohono, etc. contain these details. Unfortunately, I've forgotten these refs too. So, I'll give you the somewhat lazy solution (you probably already figured out that I'm a very lazy person), based on proximal operators and the Mureau identity...

For a convex function $f: \mathbb R^n \rightarrow (-\infty, +\infty]$, define its proximal operator

$$\text{prox}_{\lambda f}(a) := \underset{x \in \mathbb R^n}{\text{argmin }} \frac{1}{2}\|x-a\|_2^2 + \lambda f(x),$$

where $\lambda > 0$ is a varying parameter. Think of this as generalizing the notion of projection onto a convex set, where the indicator function is replaced with a more general $f$. Your problem amounts to computing the proximal operator of the $\ell_1$-norm.

Define the (Legendre) convex conjugate of $f$ by

$$f^*(a) := \max_{x \in \mathbb R^n}\langle a, x\rangle - f(x).$$

Now, if $f := \|.\|_1$, and we define the cube $C^{(n)} := \{z \in \mathbb R^n|\|z\|_\infty \le \lambda\} = C^{(1)} \times \ldots \times C^{(1)}$, then $f^*\left(\frac{a}{\lambda}\right) = i_{C^{(n)}}(a)$ (see why here), and so $$\text{prox}_{\frac{1}{\lambda} f^*}\left(\frac{a}{\lambda}\right) = \ldots = P_{C^{(n)}}(a) = (P_{C^{(1)}}(a_1),\ldots,P_{C^{(1)}}(a_n)) = (\lambda \text{sgn}(a_1), \ldots,\lambda\text{sgn}(a_n)),$$ the euclidean projection of $a$ onto the cube $C$. Thus by the Mureau identity, we get $$ \text{prox}_{\lambda f}(a) = a - \text{prox}_{\frac{1}{\lambda} f^*}\left(\frac{a}{\lambda}\right) = a - P_{C^{(n)}}(a) = (a_1 - \lambda \text{sgn}(a_1), \ldots, a_n - \lambda \text{sgn}(a_n)) = (S_\lambda(a_1),\ldots, S_\lambda(a_n)),$$ where $S_\lambda: t \mapsto \text{sgn}(t)(t - \lambda)_+$, the soft-thresholder.

N.B.: $\text{sgn}(0) := 0$. Also, note that $t = |t|\text{sgn}(t)$, for all $t \in \mathbb R$.

Hope this helps. I can provide finer details if needed.


Since there is only a single variable in your reduced problem, you can find the best choice of $x_i$ just using techniques from precalculus. (Consider separately the cases $x_i \geq 0$ and $x_i \leq 0$, and visualize the graph of a quadratic function in each case.)

Here's a different viewpoint (summarized very briefly). Let $f(x) = \|x\|$, where $\| \cdot \|$ is any norm. Note that the convex conjugate of $f$ is the indicator function of the dual norm unit ball $B$. It follows from the Moreau decomposition that \begin{equation} \text{prox}_{tf}(x) = x - P_{tB}(x), \end{equation} where $P_{tB}(x)$ is the projection of $x$ onto $tB$.

In the special case where $f(x) = \| x \|_1$, the dual norm is the $\ell_\infty$-norm, so $tB = \{ z \mid -t \leq z_i \leq t \text{ for all }i\}$. In this case projecting onto $P_{tB}$ is particularly simple, and you can see (when you work it out) that your formula for $\text{prox}_{tf}(x)$ follows.


The Optimization Problem given by the Prox Operator:

$$ \operatorname{Prox}_{\lambda {\left\| \cdot \right\|}_{1}} \left( x \right) = \arg \min_{u} \left\{ \frac{1}{2} {\left\| u - x \right\|}^{2} + \lambda {\left\| u \right\|}_{1} \right\} $$

This problem is separable with respect to both $ u $ and $ x $ hence one could solve the following problem:

$$ \arg \min_{ {u}_{i} } \left\{ \frac{1}{2} {\left( {u}_{i} - {x}_{i} \right)}^{2} + \lambda \left| {u}_{i} \right| \right\} $$

Now, you can proceed using First Order Optimality Condition and the Sub Gradient of the $ \operatorname{abs} \left( \cdot \right) $ function or you can employ simple trick.

The trick is to understand that $ {u}_{i} $ can be either positive, zero or negative.

Assuming $ {u}_{i} > 0 $ the derivative is given by $ {u}_{i} - {x}_{i} + \lambda $ which vanishes for $ {u}_{i} = {x}_{i} - \lambda $ and holds for $ {x}_{i} > \lambda $.
The same procedure for the case $ {u}_{i} < 0 $ yields $ {u}_{i} = {x}_{i} + \lambda $ for $ {x}_{i} < -\lambda $.
For values of $ {x}_{i} $ in between, since ${u}_{i} = 0 $ and hence the derivative (Sub Gradient) of $ {u}_{i} $ can freely be chosen on the range $ \left[ -1, 1 \right] $ the value of $ {u}_{i} = 0 $ holds.

In summary:

$$ \operatorname{Prox}_{\lambda {\left\| \cdot \right\|}_{1}} \left( x \right)_{i} = \operatorname{sign} \left( {x}_{i} \right) \max \left( \left| {x}_{i} \right| - \lambda, 0 \right) $$