Let $f : \mathbb{R} \to \mathbb{R}$ be convex. This means that at every point $a \in \mathbb{R}$, there is an affine linear function $l_a : \mathbb{R} \to \mathbb{R}$ which is dominated by $f$, i.e. $$ l_a(x) \leq f(x) $$ and $l_a(a) = f(a)$. When $f$ is differentiable, for example, then $l_a$ is the tangent to $f$ at $a$.

When $f$ is strictly convex, we have the additional condition $$ l_a(x) = f(x) ~\Rightarrow ~ x = a $$ Before we define $a$ in this particular problem (and sweeping integrability problems under the rug), notice that $$ l_a(X) \leq f(X) $$ holds, hence $E l_a(X) \leq E f(X)$. Moreover $E l_a(X) = l_a(E X)$ because of the linearity of $l_a$. Finally, we set $a = EX$, and have obtained $$ f(EX) \leq E f(X) $$ Suppose now that $f(EX) = E(fX)$, which can be written as $E l_a(X) = E f(X)$ with our choice $a = E X$.

With this setup, consider $E [f(X) - l_a(X)] = 0$. Inside the expectation we have a nonnegative random variable (because of convexity) and it has expectation zero. We conclude that $f(X) = l_a(X)$ almost everywhere (because we used the integral to do so! the integral doesn't see measure zero sets.)

Now we use strict convexity: $f(X) = l_a(X) ~\Rightarrow~ X = a = EX$ almost surely, i.e. $X$ is a constant.

Addendum: Claim: If $Y$ is a nonnegative-valued random variable and $E Y = 0$, then $Y = 0$ almost surely.

To see this, let $A_n = \{Y \geq 1/n\}$, i.e. the set where $Y$ is greater than $1/n$. Note that $\cup_n A_n = A := \{Y > 0\}$. Let's show that $P A_n = 0$ for any $n$, where $P$ is the probability measure.

$$ \frac{1}{n} P A_n \leq E (Y I_{A_n}) \leq E Y = 0 $$ Now recall that $P \cup_n A_n \leq \sum_n P A_n$, which is often called the 'countable subadditivity' property. This implies that $P A = 0$, and the claim follows.


Here is an alternative proof (given several years later) that is a bit more general as it does not require existence of an affine bounding function (subgradients do not always exist for convex functions defined over restricted domains).


Fix $n$ as a positive integer, let $\mathcal{X} \subseteq \mathbb{R}^n$ be a convex set, and let $f:\mathcal{X}\rightarrow\mathbb{R}$ be a strictly convex function, meaning that $$f(px + (1-p)y) < pf(x) + (1-p)f(y)$$ whenever $0<p<1$ and $x, y \in \mathcal{X}$, $x \neq y$.

Let $X$ be a random vector that takes values in $\mathcal{X}$ and that has a finite expectation $E[X]$. We know that $E[X] \in \mathcal{X}$ (this is a precursor to Jensen's inequality). Suppose that $f(E[X]) = E[f(X)]$. We show that $X=E[X]$ with prob 1.

Proof:

Define $m=E[X]$. Suppose $P[X>m] >0$ (we reach a contradiction).

Case 1: Suppose $P[X>m]=1$. Then $X-m$ is a positive random variable with prob 1 and so $E[X-m]>0$, meaning $m-m>0$, a contradiction.

Case 2: Suppose $0 < P[X>m] < 1$. Define $m_1 = E[X|X\leq m]$ and $m_2 = E[X|X>m]$. Note that $m_1 \leq m < m_2$ and $$m_1P[X\leq m] + m_2 P[X>m] = m$$ Also \begin{align} f(m) &\overset{(a)}{=} E[f(X)] \\ &= E[f(X)|X\leq m]P[X\leq m] + E[f(X)|X>m]P[X>m] \\ &\overset{(b)}{\geq} f(E[X|X\leq m])P[X\leq m] + f(E[X|X>m])P[X>m] \\ &= f(m_1)P[X\leq m] + f(m_2)P[X>m] \\ &\overset{(c)}{>} f(m_1 P[X\leq m] + m_2 P[X>m])\\ &= f(m) \end{align} where (a) holds by the assumption $f(E[X]) = E[f(X)]$; (b) holds by Jensen's inequality applied to the conditional expectations; (c) holds by strict convexity. Hence, $f(m)>f(m)$, a contradiction.

Cases 1 and 2 together imply that $P[X>m]=0$. Similarly it can be shown that $P[X<m]=0$. $\Box$