Gradient of squared Frobenius norm

Recall that if $A,B \in \mathbb{R}^{m \times n}$ then \begin{equation} \langle A, B \rangle = \text{Tr}(A^T B) \end{equation} and \begin{align*} \|A\|_F^2 &= \langle A,A \rangle \\ &= \text{Tr}(A^T A) \\ &= \text{Tr}(A A^T). \end{align*}

Let $f:\mathbb{R}^{m \times n} \to \mathbb{R}$ such that \begin{align*} f(X) &= \frac12 \| X A^T \|_F^2 \\ &= \frac12 \text{Tr}(X A^T A X^T). \end{align*} Let $J$ be the $m \times n$ matrix whose entries are all $0$ except $J_{ij}$ which is equal to $1$. Let $\Delta X = \epsilon J$, where $\epsilon > 0$ is tiny.

Then

\begin{align*} f(X + \Delta X) &= \frac12 \text{Tr}((X + \Delta X)A^T A (X + \Delta X)^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ & \qquad + \frac12 \text{Tr}(\Delta X A^T A \Delta X^T) \\ &\approx \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \text{Tr}(X A^T A \Delta X^T) \\ &= f(X) + \left\langle X A^T A,\Delta X \right\rangle \\ &= f(X) + \epsilon \left \langle X A^T A,J \right\rangle. \end{align*}

Comparing this result with the equation \begin{equation} f(X + \epsilon J) \approx f(X) + \epsilon \frac{\partial f(X)}{\partial X_{ij}} \end{equation} we see that \begin{equation} \frac{\partial f(X)}{\partial X_{ij}} = \left \langle X A^T A,J \right\rangle. \end{equation}

Let $M=XA^T$, then taking the differential leads directly to the derivative $$\eqalign{ f &= \frac{1}{2}\,M:M \cr df &= M:dM \cr &= M:dX\,A^T \cr &= MA:dX \cr &= XA^TA:dX \cr \frac{\partial f}{\partial X} &= XA^TA \cr }$$ Your question asks for the {$i,j$}-th component of this derivative, which is obtained by taking its Frobenius product with $J_{ij}$ $$\eqalign{ \frac{\partial f}{\partial X_{ij}} &= XA^TA:J_{ij} \cr }$$ If you are unfamiliar with the Frobenius product, you can express the result in terms of the trace function instead, since $\,X\!:\!Y={\rm tr}(X^TY)$.

This yields the same result as you found using the Cookbook -- except you messed up your indices between the LHS {$ij$} and RHS {$jk$} for some inexplicable reason.

Gradient of squared Frobenius norm

Related

Recent Posts