Gradient of $\lVert A(X) - b\rVert^2$ with $A$ a linear operator

I am posting the answer I got thanks to @Fei Cao's hint.

Let us consider $f(X) = \frac{1}{2}\lVert A(X)-b\rVert^2_2$ \begin{equation} \nabla f(X) = A^T(A(X) - b) \end{equation} Indeed for $H\in\mathbb{R}^{p\times m}$ it holds: \begin{align*} &f(X+H) - f(X) - \langle A^T(A(X) - b), H\rangle = \frac{1}{2}\lVert A(X) - b + A(H)\rVert^2_2 - \frac{1}{2}\lVert A(X)-b\rVert^2_2 - \langle A(X) - b, A(H)\rangle = \\ &= \frac{1}{2}\left[\lVert A(X) - b\rVert_2^2 + \lVert A(H)\rVert_2^2 + 2\langle A(X) - b, A(H)\rangle\right] - \frac{1}{2}\lVert A(X)-b\rVert^2_2 - \langle A(X) - b, A(H)\rangle = \\ & = \frac{1}{2}\lVert A(H)\rVert_2^2 = o(H) \qquad \text{for $H\to 0$} \end{align*}


Hint: Note that $\|z\|^2=z^Tz $ for column vector $z$ and use matrix derivative rules.


Use the chain rule. The derivative of $y \mapsto |y|^2$ is $2y^T$, and the derivative of $x \mapsto Ax$ is $A$. Hence by the chain rule, the derivative of $f(x) = |Ax - b|^2$ is $Df(x) = 2(Ax - b)^TA$. Hence $\nabla f(x) = Df(x)^T = 2A^T(Ax - b)$.