Proof of the Moreau Decomposition Property of Proximal Operators
Given the prox operator i.e.
$$ \operatorname{prox}_{ h \left( \cdot \right) } \left( x \right) = \arg \min_{u} h \left( u \right) + \frac{1}{2} {\left\| u - x \right\|}_{2}^{2} $$
the Moreau decomposition property says that
$$ x = \operatorname{prox}_{ h \left( \cdot \right) } \left( x \right) + \operatorname{prox}_{ {h}^{\ast} \left( \cdot \right) } \left( x \right) $$
where $h^*$ is the conjugate of $h$
I was reading a proof of this which went as follows :
Define $ u = \operatorname{prox}_h (x)$ and $v = x - u$
From optimality condition of minimization in the definition of the prox operator, $ x-u \in \partial h(u)$, so $ v \in \partial h(u)$
- $u=x-v \in \partial h^* (v)$, hence $v = \operatorname{prox}_{h^*} (x) $
I didn't understand the 3rd step of the proof, i.e. how $ u \in \partial h^* (v)$ follows from $ v \in \partial h(u)$. Could someone shed some light on this?
I'll attempt to explain the intuition here.
There may be many affine minorants of $h$ with a given slope $y$, but we only care about the best one:
\begin{align} &h(x) \geq \langle y , x \rangle - \alpha \quad \text{for all } x \\ \iff & \alpha \geq \langle y, x \rangle - h(x) \quad \text{for all } x \\ \iff & \alpha \geq \sup_x \, \langle y, x \rangle - h(x) \\ \iff & \alpha \geq h^*(y). \end{align}
Thus, the best choice of $\alpha$ is $h^*(y)$.
(If there is no affine minorant of $h$ with slope $y$, then $h^*(y) = \infty$.)
Suppose that \begin{equation} v \in \partial h(u). \end{equation}
This means: there exists some affine minorant of $h$ with slope $v$ which is exact at $u$.
Of all affine minorants of $h$ with slope $v$, the best one (the closest one) is $a(x) = \langle v, x \rangle - h^*(v)$.
Since $a$ is the best affine minorant of $h$ with slope $v$, and since some affine minorant with slope $v$ is exact at $u$, it follows that $a$ is exact at $u$:
\begin{equation} h(u) = \langle v, u \rangle - h^*(v) \end{equation}
Otherwise $a$ would not be the best.
Hence \begin{align} h^*(v) &= \langle u,v \rangle - h(u) \\ &= \langle u, v \rangle - h^{**}(u) \end{align} and we know that $\langle u, v \rangle - h^{**}(u)$ is an affine minorant of $h^*$.
Thus we have found an affine minorant of $h^*$ with slope $u$ which is exact at $v$. This means that \begin{equation} u \in \partial h^*(v). \end{equation}
In summary, note the beautiful symmetry that allowed our key step: \begin{equation} h(u) = \langle v, u \rangle - h^*(v) \qquad \text{ " $v$ is a subgradient of $h$ "} \end{equation} becomes \begin{equation} h^*(v) = \langle u, v \rangle - h(u) \qquad \text{ " $u$ is a subgradient of $h^*$ "}. \end{equation}
Read Section 22.3 of https://statweb.stanford.edu/~candes/teaching/math301/Lectures/Moreau-Yosida.pdf The proof is complete in page 22-4.