Derivative (or differential) of symmetric square root of a matrix

Solution 1:

What follows is an extension of the previous comments, to derive an explicit expression in terms of Kronecker sum. Taking differential $\mathrm{d}(\cdot)$ to both sides of $\sqrt{A}\sqrt{A} = A$ results a special case of Sylvester equation $$(\mathrm{d}\sqrt{A}) \sqrt{A} \: + \: \sqrt{A} (\mathrm{d}\sqrt{A}) = \mathrm{d}A, $$ which can be solved for the differential matrix $\mathrm{d}\sqrt{A}$ as $$ \text{vec}(\mathrm{d}\sqrt{A}) = \left(\sqrt{A}^{\top} \oplus \sqrt{A}\right)^{-1} \: \text{vec}(\mathrm{d}A). $$ Since $A$ is positive definite, $\sqrt{A}$ is unique and positive definite, and hence the Kronecker sum is positive definite (thus non-singular). Further, since the differential and vec operator can be interchanged in the left hand side of the equation above, the Jacobian identification rule (p. 198 in Magnus and Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd ed., chapter 9, section 5) results $$ \mathrm{D}\sqrt{A} = \left(\sqrt{A}^{\top} \oplus \sqrt{A}\right)^{-1}, $$ where the transpose can be dispensed if $A$, in addition to being positive-definite, is also symmetric, as the OP asked. Notice that for generic matrix function $F: \mathbb{R}^{p\times q} \mapsto \mathbb{R}^{m\times n}$, the Jacobian is defined as $\mathrm{D}F(X) \triangleq \displaystyle\frac{\partial \: \text{vec}(F(X))}{\partial \: (\text{vec}(X))^{\top}}$, and is of size $mn \times pq$.

Solution 2:

The function $A\rightarrow S=\sqrt{A}$ is defined and differentiable on the set of SPD matrices. Let $K=DS_A(H)$ be the derivative of $S$ in $A$, where $H$ is a variable SYMMETRIC matrix. Here $SS=A$ implies $KS+SK=H$, a Sylvester equation in the unknown $K$. We may assume $A=diag((\lambda_i)_i)$ where $\lambda_i>0$. Then $S=diag((\sqrt{\lambda_i})_i)$. If $H=[h_{i,j}],K=[k_{i,j}]$, then, by an easy identification, we obtain $k_{i,j}=\dfrac{h_{i,j}}{\sqrt{\lambda_i}+\sqrt{\lambda_j}}$. Of course, if $n=1$, we find the usual derivative $h_{1,1}\rightarrow \dfrac{h_{1,1}}{2\sqrt{\lambda_1}}$.

EDIT 1. About the half-vectorization operator, we can store half of matrix $K$ because it is symmetric as the matrix $H$.

EDIT 2. Another form of $K$ is $\int_0^{\infty}e^{-tS}He^{-tS}dt$. That implies that if $H$ is a small symmetric matrix, then $\sqrt{A+H}\approx \sqrt{A}+\int_0^{\infty}e^{-t\sqrt{A}}He^{-t\sqrt{A}}dt$.

EDIT 3. Proof of the above result. The integral converges (easy) and it suffices to prove that $KS+SK=H$ (the solution in $K$ of this equation is unique). One has $KS+SK=\int_0^{+\infty}e^{-tS}He^{-tS}S+Se^{-tS}He^{-tS}dt=-\int_0^{+\infty}(e^{-tS}He^{-tS})'dt=H.$

Solution 3:

To summarize Abhishek's post $$\eqalign{ A &= S^2,\quad a={\rm vec}(A),\quad s={\rm vec}(S),\quad M=S^T\oplus S \cr ds &= M^{-1}\,da \cr }$$ However, to answer the original question, one must introduce Duplication and Elimination matrices. $$\eqalign{ \alpha &= {\rm vech}(A),\quad \alpha=L_na,\quad a=D_n\alpha \cr \sigma &= {\rm vech}(S),\quad \sigma=L_ns,\quad s=D_n\sigma \cr L_n\,ds &= L_nM^{-1}(D_nL_n\,da) \cr d\sigma &= L_nM^{-1}D_n\,\,d\alpha \cr \frac{\partial\sigma}{\partial\alpha} &= L_nM^{-1}D_n \cr }$$