Derivative of a triple product matrix

I am trying to find the solution for the following derivative $$\frac{\partial \boldsymbol E\boldsymbol J\boldsymbol E^{T}}{\partial \boldsymbol E}$$

where $\boldsymbol E$ and $\boldsymbol J$ are both matrices. I tried to search for a solution mainly involving the Kronecker product and found that it could be solved using the chain rule. The problem is that I found two sources giving somewhat different version of the chain rule which I can't manage to check if they are equal. The first one from here gives

$$\frac{\partial (\boldsymbol A\boldsymbol F\boldsymbol )}{\partial \boldsymbol B}=\frac{\partial \boldsymbol A}{\partial \boldsymbol B}(\boldsymbol I\otimes \boldsymbol F)+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial \boldsymbol F}{\partial \boldsymbol B}$$

while the second one from here gives $$\frac{\partial (\boldsymbol A\boldsymbol B)}{\partial \boldsymbol x^{T}}=(\boldsymbol B^{T}\otimes \boldsymbol I)\frac{\partial vec(\boldsymbol A)}{\partial \boldsymbol x^{T}}+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial vec(\boldsymbol B)}{\partial \boldsymbol x^{T}}$$

where I am assuming that the $\boldsymbol x^{T}$ can be seen as $vec(\boldsymbol E)$. Can someone please help me in understanding how these two are equal and at the end, how my original derivative can be solved?

Thanks


Solution 1:

$ \def\bbR#1{{\mathbb R}^{#1}} \def\d{\delta} \def\k{\sum_k} \def\l{\sum_l} \def\e{\varepsilon} \def\n{\nabla}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}} \def\c#1{\color{red}{#1}} $The differential of a matrix is easy to work with, since it obeys all of the rules of matrix algebra. So let's start by calculating the differential of your function. $$\eqalign{ F &= EJE^T \\ dF &= dE\;JE^T + EJ\;dE^T \\ }$$ Vectorizing this expression yields
$$\eqalign{ f &= \vecc{F},\qquad e=\vecc{E} \\ df &= \LR{EJ^T\otimes I}\,de + \LR{I\otimes EJ}K\;de \\ \grad{f}{e} &= \LR{EJ^T\otimes I} + \LR{I\otimes EJ}K \\ }$$ where $K$ is the Commutation Matrix associated with the vec() operation.

Another approach to the problem is to use the self-gradient of a matrix, i.e. $$\eqalign{ \grad{E}{E_{ij}} = S_{ij} \\ }$$ where $S_{ij}$ is the matrix whose components are all zero, except for the $(i,j)^{th}$ component which is equal to one. This is sometimes called the single-entry matrix, and it can be used to write the component-wise gradient of the function as $$\eqalign{ \grad{F}{E_{ij}} &= S_{ij}\,JE^T + EJ\,S_{ij} \\ }$$ Yet another approach is to use Index Notation to write the self-gradient (which is a fourth-order tensor) in terms of Kronecker delta symbols as $$\eqalign{ \grad{E_{mn}}{E_{ij}} = \d_{im}\d_{jn} \\ }$$ Then calculate the gradient of the function (also a fourth-order tensor) as
$$\eqalign{ F_{mn} &= \k\l E_{mk}J_{kl}E_{ln}^T \\ \grad{F_{mn}}{E_{ij}} &= \k\l \BR{ \c{\d_{im}\d_{jk}}\;J_{kl}E_{nl} + E_{mk}J_{kl}\;\c{\d_{in}\d_{jl}} } \\ &= \l \d_{im}J_{jl}E_{ln}^T + \k E_{mk}J_{kj}\d_{in} \\ &= \d_{mi}\LR{JE^T}_{jn} + \LR{EJ}_{mj}\d_{in} \\ }$$ Once you are comfortable with the Einstein summation convention, you can drop the $\Sigma$ symbols to write the intermediate steps more concisely.