Matrix derivative $(Ax-b)^T(Ax-b)$
Solution 1:
Perhaps some help to compute the partial derivatives would be appreciated. It is always best to be explicit when one is a bit confused with heavy notation. Write $A = (a_{ij})$, $x = (x_1,\dots,x_n)^{\top}$ and $b = (b_1,\dots, b_m)^{\top}$, assuming $A$ is an $m \times n$ matrix. Then the $i^{\text{th}}$ component of $Ax-b$ is $$ (Ax-b)_i = \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i $$ so that $$ (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m (Ax-b)_i^2 = \sum_{i=1}^m \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2. $$ Suppose you want to compute the derivative with respect to $x_k$, $1 \le k \le n$ (I choose $k$ because choosing $i$ or $j$ would be confusing with the preceding subscripts used). Then $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m \frac{\partial}{\partial x_k} \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2 = \sum_{i=1}^m 2 \left( \left( \sum_{j=1}^n a_{ij}x_j \right) - b_i \right) (a_{ik}). $$ In particular, we can let $A_k = (a_{1k},a_{2k},\dots,a_{mk})$ so that $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = 2 \langle A_k, Ax-b \rangle $$ where $\langle - , - \rangle$ denotes inner product. You can use convexity arguments to show that any critical point is a minimizer in this case ; although you can see that the minimizer will not always be unique, even when $m=n$ ; it suffices for $A$ to be singular for this to happen.
Hope that helps,
Solution 2:
I assume you are trying to minimize $\langle A x -b, A x - b\rangle.$ This is always nonnegative, so if $A$ is nonsingular, then the minimum is $0.$ Otherwise, the $i$-th component of the gradient is
$$\langle A_i, Ax +b\rangle + \langle A x + b, A_i\rangle$$
Where $A_i$ is the $i$-th column of $A.$ What does this tell you?