Convexity of matrix exponential

Consider the matrix-valued function $A: \mathbb{R}^n \rightarrow \mathbb{R}^{n \times n}$ defined as

$$ A( x ) := \left[ \begin{matrix} x_1 & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & x_2 & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & \cdots & x_n \end{matrix} \right] $$

where $a_{i,j} \geq 0$ for all $i,j$. Define the matrix-valued function $F: \mathbb{R}^{ n } \rightarrow \mathbb{R}^{ n \times n }$ as

$$ F( x ) := \text{exp}( A(x) )$$

where $\text{exp}(\cdot)$ denotes the matrix exponential, i.e., $$\exp(A) := \sum_{k=0}^{\infty} \frac{A^k}{k!} = I + A + \frac{1}{2!} A^2 + \frac{1}{3! }A^3 + \cdots$$

Notice that $F$ is element-wise nonnegative, because it is the exponential of a Metzler matrix.

Let $f_{i,j} : \mathbb{R}^{ n } \rightarrow \mathbb{R}$ be the $(i,j)$-component of $F$, $f_{ij}(x) = F(x)_{ij}$.

Prove that, for all $i,j$, the function $f_{i,j}$ is convex.

Comment: I am trying to show that the second derivative of $f_{i,j}$ is nonnegative. I am also trying to show that the second derivative of $\mathbb{R} \ni t \mapsto \exp( A( x + t y ) )$ is nonnegative for all $x,y \in \mathbb{R}^n$.

It suffices to show the convexity in the domain $\Omega=\{x_i>0,\ i=1,\dots,n\}$. Once this is done, to check that the standard definition of convexity holds for a generic couple of points $x$, $y$ you can choose a large $M>0$ so that, calling $\vec{M}=(M,\dots,M)$, both $x'=x+\vec{M}\in\Omega$ and $y'=y+\vec{M}\in\Omega$;
then you have $F(z+\vec{M})=\exp\left(A(z+\vec{M})\right)=\exp(A(z)+MI)=e^M F(z)$ for any $z$ lying on the segment which joins $x$ and $y$ and what you want to check follows from the definition of convexity with $x',y'$.

So let's show that $f_{ij}$ is convex in $\Omega$: it suffices to show that if a matrix $A$ has nonnegative coefficients and $D$ is a diagonal matrix then $\exp(A+tD)+\exp(A-tD)-2\exp(A)=t^2P+O(t^3)$, where $P$ has nonnegative coefficients (because then we deduce $\nabla^2 f_{ij}(x)[h,h]=\lim_{t\to 0}\frac{f_{ij}(x+th)+f_{ij}(x-th)-2f_{ij}(x)}{t^2}\ge 0$ for any vector $h$, so $\nabla^2 f_{ij}$ is positive semidefinite and $f_{ij}$ is convex).
By the definition of the exponential, we can just show the stronger statement that for any $k\ge 0$ $(A+tD)^k+(A-tD)^k-2A^k=t^2P+O(t^3)$, where again $P$ is some matrix with nonnegative coefficients.

Now notice that $(A+tD)^k+(A-tD)^k-2A^k=2t^2\sum_{0\le r<s\le k-1} A^rDA^{s-r-1}DA^{k-s-1}+O(t^3)$ because the terms of first order cancel out, while the term of second order in the Taylor expansion of $(A\pm tD)^k$ is $t^2\sum_{0\le r<s\le k-1}\Pi_{rs}$, where $\Pi_{rs}$ is the product of $k$ factors which are all equal to $A$ except for the $r$-th and $s$-th ones which are equal to $D$ (by the $0$-th factor we mean the first one, and so on). So $P=2\sum_{0\le r<s\le k-1} A^rDA^{s-r-1}DA^{k-s-1}$.
Calling $i_0:=i$, $i_k:=j$, we then discover that the $(i,j)$-th component of $\sum_{0\le r<s\le k-2} A^rDA^{s-r-1}DA^{k-s-2}$ is $$\sum_{0\le r<s\le k}\sum_{i_1,\dots,i_{k-1}}a_{i_0i_1}\cdots a_{i_{r-1}i_r}d_{i_ri_{r+1}}a_{i_{r+1}i_{r+2}}\dots a_{i_{s-1}i_s}d_{i_si_{s+1}}a_{i_{s+1}i_{s+2}}\cdots a_{i_{k-1}i_k}$$ but $D$ is diagonal and has the effect of "freezing the indices" for one step, i.e. in the sum we can take $i_r=i_{r+1}$ and $i_s=i_{s+1}$. Thus, by shifting all the indices, the big sum becomes $$\sum_{0\le r\le s\le k-2}\sum_{j_1,\dots,j_{k-3}}a_{j_0j_1}\cdots a_{j_{r-1}j_r}d_{j_r}a_{j_rj_{r+1}}\dots a_{j_{s-1}j_s}d_{j_s}a_{j_sj_{s+1}}\cdots a_{i_{k-3}i_{k-2}}$$ where we have put $d_j:=d_{jj}$ and the extremal indices are $j_0:=i$, $j_{k-2}:=j$.
Now we exchange the two sums and we are left to show that $$\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}a_{j_0j_1}\cdots a_{j_{r-1}j_r}a_{j_rj_{r+1}}\dots a_{j_{s-1}j_s}a_{j_sj_{s+1}}\cdots a_{i_{k-3}i_{k-2}}\ge 0$$ for any fixed choice of indices $j_0,\dots,j_{k-2}$. The product of the coefficients of $A$ is nonnegative and independent of $(r,s)$, so we just have to show that $$\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}\ge 0$$ which is clear as $2\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}=2\sum_{0\le r\le k-2}d_{j_r}^2+2\sum_{0\le r<s\le k-2}d_{j_r}d_{j_s}=\sum_{0\le r\le k-2}d_{j_r}^2+\left(\sum_{0\le r\le k-2}d_{j_r}\right)^2\ge 0.$

you will find the proof in

"Convexity of the cost functional in an optimal control problem for a class of positive switched systems" Patrizio Colaneri, Richard H. Middleton , Zhiyong Chen , Danilo Caporale , Franco Blanchini, Automatica 2014

Convexity of matrix exponential

Related

Recent Posts