How can I find the gradient of the term $a^TXb$ where $X$ is a $n \times m$ matrix, and $a$ and $b$ are column vectors. Since the gradient is with respect to a matrix, it should be a matrix. But I do not have a clue on how to derive this gradient.

Any help ?


Solution 1:

Write the function in terms of the inner/Frobenius product (which I'll denote by a colon). Then finding the differential and gradient is straightforward $$\eqalign{ f &= ab^T:X \cr\cr df &= ab^T:dX \cr\cr \frac{\partial f}{\partial X} &= ab^T \cr\cr }$$ Note that the inner product is really just an infix notation for the trace $$A:B = {\rm tr}(A^TB)$$

Solution 2:

Let

$$f (\mathrm X) := \mathrm a^{\top} \mathrm X \, \mathrm b = \mbox{tr} \left(\mathrm a^{\top} \mathrm X \, \mathrm b\right) = \mbox{tr} \left(\mathrm b \mathrm a^{\top} \mathrm X\right) = \langle \mathrm a \mathrm b^{\top}, \mathrm X\rangle$$

where the cyclic property of the trace was used and $\langle \cdot \,, \cdot \rangle$ denotes the Frobenius inner product. Since scalar field $f$ is linear in $\rm X$, its gradient is simply

$$\nabla f (\mathrm X) = \color{blue}{\mathrm a \mathrm b^{\top}}$$


matrix-calculus scalar-fields gradient

Solution 3:

By brute force: $$a^tXb = \sum_{j=1}^n\sum_{i=1}^m a_j x_{ij}b_i,$$ $$\frac{\partial a^tXb}{\partial x_{ij}} = a_jb_i.$$