Understanding the subdifferential sum rule

A previous question asked: Given:

  • $f$ and $g$ are lower-semicontinuous proper convex functions,
  • $x \in \text{ri dom}(f) \cap \text{ri dom}(g)$,
  • $h = f+g$,
  • $p \in \partial h(x)$,

Prove that there exist some $s \in \partial f(x)$ and $t \in \partial g(x)$ such that $p = s+t$.

In this question, it was commented that "The sum-rule can be understood as a separation theorem of the epigraphs of $f$ and $g$." Could anyone please explain or elaborate on this comment?

More generally, could anyone explain what they think is the most enlightening way to understand the subdifferential sum rule? Is there a point of view that makes the proof seem well-motivated or obvious?


An equivalent formulation for the sum rule is: ``find $s \in \partial f(x)$ such that $p - s \in \partial g(x)$''.

Now, we can reformulate the second relation to isolate $s$ on the left-hand side. Define the concave function $h_p(y) = p^\top \, (y-x) - g(y) + g(x) + f(x)$. Then, the second relation becomes $s \in \hat\partial h_p(x)$, where $\hat\partial$ is the superderivative of concave functions.

Let us reconsider this sub/supergradient relation in terms of the epigraph of $f$ and the hypograph of $h_p$. By $s \in \partial f(x)$ we obtain \begin{equation*} f(x) - s^\top x \le f(y) - s^\top y \end{equation*} for all $y$ and this implies \begin{equation*} f(x) - s^\top x \le (1, -s)^\top k \end{equation*} for all $k$ in the epigraph of $f$.

Similar, from $s \in \hat\partial h_p(x)$ we get \begin{equation*} h_p(x) - s^\top x = f(x) - s^\top x \ge h_p(y) - s^\top y \end{equation*} for all $y$ and this implies \begin{equation*} f(x) - s^\top x \ge (1,-s)^\top k \end{equation*} for all $k$ in the hypograph $\{(a,b) \in \mathbb{R} \times \mathbb{R}^n : a \le h_p(b)\}$ of $h_p$.

Hence, the validity of the sum rule implies that we can separate the epigraph of $f$ and the hypograph of $h_p$ (which is just the mirrored, shifted and tilted epigraph of $g$).

To the contrary, let $p \in \partial (f+g)(x)$ be given. This implies that the epigraph of $f$ and the hypograph of $h_p$ are disjunct. If you can separate the epigraph of $f$ and the hypograph of $h_p$, then you find $a \in \mathbb{R}$ and $(\lambda,-s) \in (\mathbb{R}\times\mathbb{R}^n) \setminus \{0\}$ such that \begin{equation*} (\lambda,- s)^\top k \ge a \ge (\lambda,-s)^\top l \end{equation*} for $k$ in the epigraph of $f$ and $l$ in the epigraph of $h_p$. Since you can make the first component of $k$ arbitrarily large, you obtain $\lambda \ge 0$, and $\lambda = 0$ cannot happen, since this would imply $s = 0$. Hence, we can rescale $\lambda$ to be $1$. This implies \begin{equation*} f(y) - s^\top y \ge a \ge (p^\top \, (y-x) - g(y) + g(x) + f(x)) - s^\top y \end{equation*} for all $y$. Plugging in $y = x$ yields $a = f(x) - s^\top x$. The considerations above yield $s \in \partial f(x)$ and $p - s \in \partial g(x)$.

I hope this answer clarifies that a sum rule is essentialy a separation theorem.