Chain rule and derivative with matrix product?

I'm trying to compute some derivatives with given vectors and functions: column vector $X=[x_1,x_2,\dots,x_n ]^T$ and $Z=[z_1,z_2,\dots,z_n ]^T$, row vector $Y=[y_1,y_2,\dots,y_n ]$

$f(X,Y)=e^{XY}$ $$\dfrac{\partial f}{\partial Y} = e^{XY}X $$
$f^T(X,Y)=(e^{XY})^T$ $$\dfrac{\partial (e^{XY})^T}{\partial Y} = $$
$f=XZ^T \odot e^{XY}$
Using $\partial (A \odot B) = (\partial A) \odot B + A \odot (\partial B)$

$$\dfrac{\partial f}{\partial X} = \dfrac{\partial (XZ^T)}{\partial X} \odot e^{XY} + XZ^T \odot \dfrac{\partial e^{XY}}{\partial X}$$ but it seems to me it is wrong cause $\partial (XZ^T)/\partial X$ will result in $Z$ which is vector and $e^{XY}$ is a matrix.

Additional questions

So, a stumbling block here is that we take derivative from a matrix by a vector which results in a tensor...

I haven't seen before the following expression. Taylor series? $$f(H) = f_0I + \dfrac{f_\lambda - f_0}{\lambda} H$$ $$df(H) = \dfrac{f_\lambda - f_0}{\lambda} dH + \left( \dfrac{f_\lambda' }{\lambda} - \dfrac{f_\lambda - f_0 }{\lambda^2} \right)H d\lambda$$ Does this apply to any analytical function and any matrix H?
I tried to rewrite few expressions and wondering whether the output for this operations would be different. I mean any chance to avoid 3-rank tensors and Khatri-Rao product $$\text{vec}(f_1) = \text{vec}(x_0^T\,\left( A \odot B \right)) = (I \otimes x_0^T)\, \text{vec}(A) \odot \text{vec}(B) \quad ?$$ $$f_2 = G_1^T(x_0y_1^T)\,G_1(x_0y_1^T)\,y_2$$ where $x_0,y_1,y_2$ are vectors and, for example, $G_1(x) = e^x$, $G_2(x)=1-e^{-x}$. Here we take derivatives from a vector by vector. In your answer you've used trick with diagonal matrices for rewriting Hadamard product, however in $f_1$ slightly another situation $$\dfrac{\partial f_1}{\partial y_1^T} = $$

$$\dfrac{\partial f_1}{\partial y_2} = $$ 4. I'm not sure what to do with $G_1^T$ $$\dfrac{\partial f_2}{\partial y_1^T} = $$ 5. $$\dfrac{\partial f_2}{\partial y_2} = G_1^T(x_0y_1^T)\, G_1(x_0y_1^T)$$

Update 2

Having column vectors $x, y$ with dimensions k, m and rectangular matrix $H = xy^T$ with size k$\times$m let's compute derivative of function $f(H)$ with respect to vector $y^T$, where $f$ any analytical function ($e^x$ or $sin(x)$) $$df = f'(H) \odot dH = f'(H) \odot (xdy^T)$$ \begin{align} \text{vec}(df) =& \, \text{vec}(f'(H)) \odot \text{vec}(xdy^T) \\ =& \,\text{diagvec}(f'(H))\, \text{vec}(xdy^T) \\ =& \,\text{diagvec}(f'(H))\, (I_m \otimes x) \, \text{vec}(dy^T) \end{align} $$\dfrac{\text{vec}(df)}{dy^T} = \,\text{diagvec}(f'(H))\, (I_m \otimes x)$$ Final vectorized derivative has size (km$\times$m).

$ \def\bbR#1{{\mathbb R}^{#1}} \def\bx{\boxtimes} \def\a{\phi}\def\b{\psi} \def\o{{\tt1}}\def\l{\lambda}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $First, for consistency let's use uppercase letters for matrices, lowercase for column vectors, and Greek letters for scalars. This renames your variables $(f,X,Y,Z)\to(F,x,y^T,z)$

For the first problem, notice that $H=xy^T$ is a rank-one matrix. This permits any analytic function to be evaluated as follows $$\eqalign{ H &= xy^T &\qiq dH = x\,dy^T\\ \l &= \trace H=x^Ty &\qiq d\l = x^Tdy \\ f_\l&=f(\l),\quad f_0=f(0) \\ }$$ $$\eqalign{ f(H) &= f_0\,I + \LR{\frac{f_\l-f_0}{\l}}H \qquad\qquad\qquad\quad\quad \\ }$$ where $I$ is the $(n\times n)$ identity matrix.

So for the exponential function we can calculate the differential as $$\eqalign{ F_1 &= \exp(H) \\ &= I + \LR{\frac{e^\l-\o}{\l}}H \\ dF_1 &= \LR{\frac{e^\l-\o}{\l}}dH + \LR{\frac{e^\l}{\l}-\frac{e^\l-\o}{\l^2}}H\,d\l \\ &= \a x\,dy^T + \b Hx^Tdy \\ }$$ Unfortunately, the gradient of a matrix with respect to a vector is a third-order tensor.

To avoid introducing tensor notation, let's use Kronecker products $(\otimes)$ to vectorize the matrices in that last equation $$\eqalign{ f_1 &= \vecc{F_1},\qquad h=\vecc H = y\otimes x \\ df_1 &= \BR{\a I\otimes x}\,dy + \b hx^T\,dy \\ \grad{f_1}{y} &= \BR{\a I\otimes x} + \b hx^T \\ &= \LR{\frac{e^\l-\o}{\l}}\BR{I\otimes x} + \LR{\frac{\l e^\l-e^\l+\o}{\l^2}}hx^T \\ }$$ In a similar way, you can calculate the gradient with respect to $x$. $$\eqalign{ dF_1 &= \phi\,dx\,y^T + \psi Hy^Tdx \\ df_1 &= \BR{\a y\otimes I}\,dx + \b hy^T\,dx \\ \grad{f_1}{x} &= \LR{\a y\otimes I} + \b hy^T \\ }$$

I'm not sure about the intent of the second function.

The third function, can be rewritten using diagonal matrices $$\eqalign{ X &= \Diag{x} \qif x = \diag{X} \\ Z &= \Diag{z} \qif z = \diag{Z} \\ F_3 &= xz^T \odot F_1 \;=\; XF_1Z \\ dF_3 &= dX\,F_1Z + X\,dF_1\,Z \\ }$$ and the Khatri-Rao product $(\bx)$ $$\eqalign{ \vecc{AXB} &= \LR{B^T\bx A}x \\ {B^T\bx A} &= \BR{B^T\otimes\o_a}\odot\BR{\o_b\otimes A} \\ }$$ yielding $$\eqalign{ \vecc{dF_3} &= \LR{ZF_1^T\bx I}dx + \LR{Z\otimes X}df_1 \\ df_3 &= \LR{ZF_1^T\bx I}dx + \LR{Z\otimes X}\LR{\a y\otimes I}\,dx + \LR{Z\otimes X}\LR{\b hy^T}\,dx \\ \grad{f_3}{x} &= \LR{ZF_1^T\bx I} + \a\LR{Z\otimes X}\LR{y\otimes I} + \b\LR{Z\otimes X}\LR{hy^T} \\\\ }$$

NB: If $x$ and $y$ are (nearly) orthogonal, then $\,(\l,\phi,\psi)\to\LR{0,\o,\tfrac{\o}{2}}$

Update

Rename $(x_0,y_1,y_2)\to(x,y,z)$, then in response to your additional questions...

The given expression is derived from the Taylor series using the fact that $H^2 = \l H$ which allows all higher powers of the matrix to be reduced to $H^{p+1}=\l^pH\;-$ but this special reduction formula only holds for rank-one matrices.
The term $\LR{G_2}$ is a matrix, while $\LR{x^TG_1 zz^T}$ is a row vector, so their Hadamard product is not defined.
Since $f_1$ is not defined, neither is its gradient.
Note that $G_2=\LR{I-G_1^{-1}}\,$ and $\,G_1$ is just a repeat of your original function $F_1=\exp(H)=G_1$ and you already know its differential $$\eqalign{ dG_1 &= \a x\,dy^T + \b Hx^Tdy \\ dG_2 &= G_1^{-1}\,dG_1\,G_1^{-1} \\ }$$ Now you wish to incorporate $G_1$ into a vector function $$\eqalign{ f_2 &= G_1^TG_1z \\ &= F_1^TF_1z \\ df_2 &= dF_1^TF_1z + F_1^TdF_1z \\ &= \LR{z^TF_1^T\otimes I}\vecc{dF_1^T} + \LR{z^T\otimes F_1^T}\vecc{dF_1} \\ &= \BR{\LR{z^TF_1^T\otimes I}K + \LR{z^T\otimes F_1^T}}\;df_1 \\ }$$ where $K$ is the Commutation Matrix associated with the vec() operation, and the differential $df_1$ was previously derived.
Correct

Chain rule and derivative with matrix product?

Update

Related

Recent Posts