Derivative of squared Frobenius norm of a matrix
In linear regression, the loss function is expressed as
$$\frac1N \left\|XW-Y\right\|_{\text{F}}^2$$
where $X, W, Y$ are matrices. Taking derivative w.r.t $W$ yields
$$\frac 2N \, X^T(XW-Y)$$
Why is this so?
Solution 1:
Let
$$\begin{array}{rl} f (\mathrm W) &:= \| \mathrm X \mathrm W - \mathrm Y \|_{\text{F}}^2 = \mbox{tr} \left( (\mathrm X \mathrm W - \mathrm Y)^{\top} (\mathrm X \mathrm W - \mathrm Y) \right)\\ &\,= \mbox{tr} \left( \mathrm W^{\top} \mathrm X^{\top} \mathrm X \mathrm W - \mathrm Y^{\top} \mathrm X \mathrm W - \mathrm W^{\top} \mathrm X^{\top} \mathrm Y + \mathrm Y^{\top} \mathrm Y \right)\end{array}$$
Differentiating with respect to $\mathrm W$,
$$\nabla_{\mathrm W} f (\mathrm W) = 2 \, \mathrm X^{\top} \mathrm X \mathrm W - 2 \, \mathrm X^{\top} \mathrm Y = \color{blue}{2 \, \mathrm X^{\top} \left( \mathrm X \mathrm W - \mathrm Y \right)}$$
matrix-calculus scalar-fields gradient
Solution 2:
Let $X=(x_{ij})_{ij}$ and similarly for the other matrices. We are trying to differentiate $$ \|XW-Y\|^2=\sum_{i,j}(x_{ik}w_{kj}-y_{ij})^2\qquad (\star) $$ with respect to $W$. The result will be a matrix whose $(i,j)$ entry is the derivative of $(\star)$ with respect to the variable $w_{ij}$.
So think of $(i,j)$ as being fixed now. Only some of the terms in $(\star)$ depend on $w_{ij}$. Taking their derivative gives $$ \frac{d\|XW-Y\|^2}{dw_{ij}}=\sum_{k}2x_{ki}(x_{ki}w_{ij}-y_{kj})=\left[2X^T(XW-Y)\right]_{i,j}. $$
Solution 3:
Just want to have more details on the process. The process should be Denote $X = [x_{ij}], W = [w_{ij}], Y = [y_{ij}]$, then we have $$ \left \| XW - Y \right \|^{2} = \sum_{k, j} (\sum_{i} x_{ki} w_{ij} - y_{kj})^{2}, $$ This is a scalar and by taking the derivative w.r.t. the matrix $W$ we get a matrix. By taking $i, j$ as the known number, we get $$ \frac{d \left \| XW - Y \right \|^{2}}{d w_{ij}} = \sum_{k} 2x_{ki} (\sum_{i} x_{ki} w_{ij} - y_{kj})\\ = \sum_{k} 2x_{ki} (XW - Y)_{kj} \\ = [2 X^{T} (XW - Y)]_{ij} $$ Thus we have $$ \frac{d \left \| XW - Y \right \|^{2}}{d W} = 2 X^{T} (XW - Y) $$ First time answering a question, hope it is right, thanks!