Chain rule and matrices - I'm confused
Solution 1:
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars $$\eqalign{ G &= G(Q,X) \cr F &= F(G,Y) \cr t &= {\rm tr}(F) = I:F \cr dt &= I:dF \cr }$$ First, let's calculate the differential and gradient wrt $Y$ $$\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial Y}:dY\Big) \cr \frac{\partial t}{\partial Y} &= I:\frac{\partial F}{\partial Y} \cr }$$ And now wrt $X$ $$\eqalign{ dt &= I:\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}:dX\Big) \cr \frac{\partial t}{\partial X} &= I:\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X} \cr\cr }$$ Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form $$\eqalign{ \Big(\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial G_{ij}}{\partial X_{kl}}\cr\cr }$$
Also note that colons are used to denote the double-contraction product, e.g. $$\Big(\frac{\partial F}{\partial G}:\frac{\partial G}{\partial X}\Big)_{ijkl} = \frac{\partial F_{ij}}{\partial G_{mn}}\,\frac{\partial G_{mn}}{\partial X_{kl}}$$