Derivation of Soft Thresholding Operator / Proximal Operator of $ {L}_{1} $ Norm

Solution 1:

I wrote a more detailed derivation of the soft-thresholding operator, following the source you mention and other ones. I hope it helps.

The soft-thresholding is just the proximal mapping of the $l_1$-norm. Let $f(x) = \lambda\|x\|_1$, then the proximal mapping of $f$ is defined as

\begin{equation} \operatorname{prox}_f(x) = \operatorname{argmin}_z\{\frac{1}{2}\|x-z\|^2_2 + \lambda\|z\|_1 \} \end{equation}

The optimality condition for the previous problem is

\begin{equation} 0 \in \nabla(\frac{1}{2}\|x-z\|^2_2) + \partial(\lambda\|z\|_1) \Leftrightarrow 0 \in z-x + \lambda\partial\|z\|_1 \end{equation}

The $l_1$-norm is separable and thus we can consider each of its components separately. Let's examine first the case where $z_i \neq 0$. Then, $\partial \|z_i\|=\operatorname{sign}(z_i)$ and the optimum $z_i^*$ is obtained as

\begin{equation} 0 = z_i-x_i + \lambda \operatorname{sign}(z_i) \Leftrightarrow z_i^* = x_i - \lambda \operatorname{sign}(z_i^*) \end{equation}

Note also that if $z_i^* < 0$, then $x_i < -\lambda$ and equivalently if $z_i^* > 0 \Rightarrow x_i > \lambda$. Thus, $|x_i| > \lambda$ and $\operatorname{sign}(z_i^*) = \operatorname{sign}(x_i)$. Substituting in the previous equation we get

\begin{equation} z_i^* = x_i - \lambda \operatorname{sign}(x_i) \end{equation}

In the case where $z_i = 0$, the subdifferential of the $l_1$-norm is the interval $[-1,1]$ and the optimality condition is

\begin{equation} 0 \in -x_i + \lambda[-1,1] \Leftrightarrow x_i \in [-\lambda,\lambda] \Leftrightarrow |x_i| \leq \lambda \end{equation}

Putting all together we get

\begin{equation} [\operatorname{prox}_f(x)]_i = z_i^* = \left\{ \begin{array}{lr} 0 & \text{if } |x_i| \leq \lambda \\ x_i - \lambda \operatorname{sign}(x_i) & \text{if } |x_i| > \lambda \end{array}\right. \end{equation}

The previous equation can also be written as

\begin{align*} [\operatorname{prox}_f(x)]_i &= \operatorname{sign}(x_i)\max(|x_i|-\lambda, 0) \\ &= \operatorname{sign}(x_i)(|x_i|-\lambda)_+ \end{align*}

where $(\cdot)_+$ denotes the positive part.

Solution 2:

Suppose $\hat{x} \in \mathbb R$ and we want to find a minimizer of $f(x) = |x| + \frac{1}{2t}(x - \hat{x})^2$. We want to make $|x|$ small without straying too far from $\hat{x}$.

Consider the case where $\hat{x} > 0$.

In this case there's no reason to let $x$ be negative. Because if $x$ is negative, then $-x$ is closer to $\hat{x}$ and $-x$ has the same absolute value as $x$.

So we need to minimize the quadratic function $g(x) = x + \frac{1}{2t} (x - \hat{x})^2$ over $[0,\infty)$. The graph of $g$ is a parabola that opens upwards, and we can minimize $g$ using techniques from pre-calculus.

The case where $\hat{x} < 0$ is similar.