Derivative of a triple product matrix
I am trying to find the solution for the following derivative $$\frac{\partial \boldsymbol E\boldsymbol J\boldsymbol E^{T}}{\partial \boldsymbol E}$$
where $\boldsymbol E$ and $\boldsymbol J$ are both matrices. I tried to search for a solution mainly involving the Kronecker product and found that it could be solved using the chain rule. The problem is that I found two sources giving somewhat different version of the chain rule which I can't manage to check if they are equal. The first one from here gives
$$\frac{\partial (\boldsymbol A\boldsymbol F\boldsymbol )}{\partial \boldsymbol B}=\frac{\partial \boldsymbol A}{\partial \boldsymbol B}(\boldsymbol I\otimes \boldsymbol F)+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial \boldsymbol F}{\partial \boldsymbol B}$$
while the second one from here gives $$\frac{\partial (\boldsymbol A\boldsymbol B)}{\partial \boldsymbol x^{T}}=(\boldsymbol B^{T}\otimes \boldsymbol I)\frac{\partial vec(\boldsymbol A)}{\partial \boldsymbol x^{T}}+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial vec(\boldsymbol B)}{\partial \boldsymbol x^{T}}$$
where I am assuming that the $\boldsymbol x^{T}$ can be seen as $vec(\boldsymbol E)$. Can someone please help me in understanding how these two are equal and at the end, how my original derivative can be solved?
Thanks
Solution 1:
$
\def\bbR#1{{\mathbb R}^{#1}}
\def\d{\delta}
\def\k{\sum_k}
\def\l{\sum_l}
\def\e{\varepsilon}
\def\n{\nabla}\def\o{{\tt1}}\def\p{\partial}
\def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}}
\def\B{\Big}\def\L{\left}\def\R{\right}
\def\LR#1{\L(#1\R)}
\def\BR#1{\B(#1\B)}
\def\vecc#1{\operatorname{vec}\LR{#1}}
\def\Diag#1{\operatorname{Diag}\LR{#1}}
\def\trace#1{\operatorname{Tr}\LR{#1}}
\def\qiq{\quad\implies\quad}
\def\grad#1#2{\frac{\p #1}{\p #2}}
\def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}}
\def\c#1{\color{red}{#1}}
$The differential of a matrix is easy to work with, since it obeys
all of the rules of matrix algebra. So let's start by calculating
the differential of your function.
$$\eqalign{
F &= EJE^T \\
dF &= dE\;JE^T + EJ\;dE^T \\
}$$
Vectorizing this expression yields
$$\eqalign{
f &= \vecc{F},\qquad e=\vecc{E} \\
df &= \LR{EJ^T\otimes I}\,de + \LR{I\otimes EJ}K\;de \\
\grad{f}{e} &= \LR{EJ^T\otimes I} + \LR{I\otimes EJ}K \\
}$$
where $K$ is the Commutation Matrix associated with
the vec()
operation.
Another approach to the problem is to use the self-gradient of a matrix, i.e.
$$\eqalign{
\grad{E}{E_{ij}} = S_{ij} \\
}$$
where $S_{ij}$ is the matrix whose components are all zero, except for the
$(i,j)^{th}$ component which is equal to one. This is sometimes called the
single-entry matrix, and it can be used to write the
component-wise gradient of the function as
$$\eqalign{
\grad{F}{E_{ij}} &= S_{ij}\,JE^T + EJ\,S_{ij} \\
}$$
Yet another approach is to use Index Notation to write the self-gradient (which is a fourth-order tensor) in terms of Kronecker delta symbols as
$$\eqalign{
\grad{E_{mn}}{E_{ij}} = \d_{im}\d_{jn} \\
}$$
Then calculate the gradient of the function
(also a fourth-order tensor) as
$$\eqalign{
F_{mn} &= \k\l E_{mk}J_{kl}E_{ln}^T \\
\grad{F_{mn}}{E_{ij}}
&= \k\l \BR{ \c{\d_{im}\d_{jk}}\;J_{kl}E_{nl}
+ E_{mk}J_{kl}\;\c{\d_{in}\d_{jl}} } \\
&= \l \d_{im}J_{jl}E_{ln}^T + \k E_{mk}J_{kj}\d_{in} \\
&= \d_{mi}\LR{JE^T}_{jn} + \LR{EJ}_{mj}\d_{in} \\
}$$
Once you are comfortable with the Einstein summation convention,
you can drop the $\Sigma$ symbols to write the intermediate steps more concisely.