In Boyd's Convex Optimization book, Example 3.25 finds the conjugate function $f^*(y):=\sup_{x\in\text{dom}(f)}(y^Tx-f(x))$ of the log-sum-exp function $f(x):=\log(\sum_{i=1}^ne^{x_i})$. First, the gradient of $y^Tx-f(x)$ is taken to yield the condition:

$$ y_i=\frac{e^{x_i}}{\sum_{j=1}^ne^{x_j}}\quad i=1,...,n $$

where we see that a solution for $y$ exists if and only if $y\succ 0$ and $\textbf{1}^Ty=1$. Then the book simply says:

By substituting the expression for $y_i$ into $y^Tx-f(x)$ we obtain $f^*(y)=\sum_{i=1}^ny_i\log(y_i)$.

So far I've been unsuccessful in deriving this. How does one proceed? All I see is:

$$ y^Tx-f(x)=\sum_{i=1}^ny_ix_i-\log(\sum_{i=1}^ne^{x_i})=\frac{\sum_{i=1}^nx_ie^{x_i}}{\sum_{j=1}^ne^{x_j}}-\log(\sum_{i=1}^ne^{x_i}) $$

But from here on I do not knonw how to proceed.


It's perhaps easier to substitute the expression for $x_i$ in terms of $y_i$: $$y_i=\frac{e^{x_i}}{\sum_{j=1}^ne^{x_j}}=\frac{e^{x_i}}{e^{f(x)}} \Leftrightarrow x_i = \log y_i + f(x)$$ Then using the fact that $1^Ty=1$ the expression simplifies to: $$ \begin{aligned} y^Tx - f(x) &= \sum_{i=1}^n y_i x_i - f(x) \\ &= \sum_{i=1}^n y_i (\log y_i + f(x)) -f(x) \\ &= \sum_{i=1}^n y_i \log y_i + \sum_{i=1}^n y_i f(x) -f(x)\\ &= \sum_{i=1}^n y_i \log y_i \end{aligned} $$


$\def\T{^{\mathrm{T}}} \def\e{\mathrm{e}}$Because$$ y\T x - f(x) = \frac{\sum\limits_{k = 1}^n x_k \e^{x_k}}{\sum\limits_{k = 1}^n \e^{x_k}} - \ln\left( \sum_{k = 1}^n \e^{x_k} \right) $$ and\begin{align*} \sum_{k = 1}^n y_k \ln y_k &= \sum_{k = 1}^n \frac{\e^{x_k}}{\sum_{j = 1}^n \e^{x_j}} \left( \ln \e^{x_k} - \ln \sum_{j = 1}^n \e^{x_j} \right)\\ &= \sum_{k = 1}^n \frac{\e^{x_k} \ln \e^{x_k}}{\sum_{j = 1}^n \e^{x_j}} - \sum_{k = 1}^n \e^{x_k} \cdot \frac{\ln \sum_{j = 1}^n \e^{x_j}}{\sum_{j = 1}^n \e^{x_j}}\\ &= \frac{\sum_{k = 1}^n \e^{x_k} \ln \e^{x_k}}{\sum_{j = 1}^n \e^{x_j}} - \left( \sum_{k = 1}^n \e^{x_k} \right) \cdot \frac{\ln \sum_{j = 1}^n \e^{x_j}}{\sum_{j = 1}^n \e^{x_j}}\\ &= \frac{\sum\limits_{k = 1}^n x_k \e^{x_k}}{\sum\limits_{k = 1}^n \e^{x_k}} - \ln\left( \sum_{k = 1}^n \e^{x_k} \right), \end{align*} then$$ y\T x - f(x) = \sum_{k = 1}^n y_k \ln y_k. $$