Vector derivative w.r.t its transpose $\frac{d(Ax)}{d(x^T)}$

Given a matrix $A$ and column vector $x$, what is the derivative of $Ax$ with respect to $x^T$ i.e. $\frac{d(Ax)}{d(x^T)}$, where $x^T$ is the transpose of $x$?

Side note - my goal is to get the known derivative formula $\frac{d(x^TAx)}{dx} = x^T(A^T + A)$ from the above rule and the chain rule.


Solution 1:

Let $f(x) = x^TAx$ and you want to evaluate $\frac{df(x)}{dx}$. This is nothing but the gradient of $f(x)$.

There are two ways to represent the gradient one as a row vector or as a column vector. From what you have written, your representation of the gradient is as a row vector.

First make sure to get the dimensions of all the vectors and matrices in place.

Here $x \in \mathbb{R}^{n \times 1}$, $A \in \mathbb{R}^{n \times n}$ and $f(x) \in \mathbb{R}$

This will help you to make sure that your arithmetic operations are performed on vectors of appropriate dimensions.

Now lets move on to the differentiation.

All you need to know are the following rules for vector differentiation.

$$\frac{d(x^Ta)}{dx} = \frac{d(a^Tx)}{dx} = a^T$$ where $x,a \in \mathbb{R}^{n \times 1}$.

Note that $x^Ta = a^Tx$ since it is a scalar and the equation above can be derived easily.

(Some people follow a different convention i.e. treating the derivative as a column vector instead of a row vector. Make sure to stick to your convention and you will end up with the same conclusion in the end)

Make use of the above results to get,

$$\frac{d(x^TAx)}{dx} = x^T A^T + x^T A$$ Use product rule to get the above result i.e. first take $Ax$ as constant and then take $x^T A$ as constant.

So, $$\frac{df(x)}{dx} = x^T(A^T + A)$$

Solution 2:

I think there is no such thing. $\mbox{d}(x^\mbox{T}Ax)/\mbox{d}x$ is something that, when multiplied by the change $\mbox{d}x$ in $x$, yields the change $\mbox{d}(x^\mbox{T}Ax)$ in $x^\mbox{T}Ax$. Such a thing exists and is given by the formula you quote. $\mbox{d}(Ax)/\mbox{d}(x^\mbox{T})$ would have to be something that, when multiplied by the change $\mbox{d}x^\mbox{T}$ in $x^\mbox{T}$, yields the change $\mbox{d}Ax$ in $Ax$. No such thing exists, since $x^\mbox{T}$ is a $1 \times n$ row vector and $Ax$ is an $n \times 1$ column vector.

If your main goal is to derive the derivative formula, here's a derivation:

$(x^\mbox{T} + \mbox{d}x^\mbox{T})A(x + \mbox{d}x) = x^\mbox{T}Ax + \mbox{d}x^\mbox{T}Ax + x^\mbox{T}A\mbox{d}x + \mbox{d}x^\mbox{T}A\mbox{d}x =$

$=x^\mbox{T}Ax + x^\mbox{T}A^\mbox{T}\mbox{d}x + x^\mbox{T}A\mbox{d}x + O (\lVert \mbox{d}x \rVert^2) = x^\mbox{T}Ax + x^\mbox{T}(A^\mbox{T} + A)\mbox{d}x + O (\lVert \mbox{d}x \rVert^2)$

Solution 3:

Mathematicians kill each other about derivatives and gradients. Do not be surprised if the students do not understand one word about this subject. The previous havocs are partly caused by the Matrix Cookbook, a book that should be blacklisted. Everyone has their own definition. $\dfrac{d(f(x))}{dx}$ means either a derivative or a gradient (scandalous). We could write $D_xf$ as the derivative and $\nabla _xf$ as the gradient. The derivative is a linear application and the gradient is a vector if we accept the following definition: let $f:E\rightarrow \mathbb{R}$ where $E$ is an euclidean space. Then, for every $h\in E$, $D_xf(h)=<\nabla_x(f),h>$. In particular $x\rightarrow x^TAx$ has a gradient but $x\rightarrow Ax$ has not ! Using the previous definitions, one has (up to unintentional mistakes):

Let $f:x\rightarrow Ax$ where $A\in M_n$ ; then $D_xf=A$ (no problem). On the other hand $x\rightarrow x^T$ is a bijection (a simple change of variable !) ; then we can give meaning to the derivative of $Ax$ with respect to $x^T$: consider the function $g:x^T\rightarrow A(x^T)^T$ ; the required function is $D_{x^T}g:h^T\rightarrow Ah$ where $h$ is a vector ; note that $D_{x^T}g$ is a constant. EDIT: if we choose the bases $e_1^T,\cdots,e_n^T$ and $e_1,\cdots,e_n$ (the second one is the canonical basis), then the matrix associated to $D_{x^T}g$ is $A$ again.

Let $\phi:x\rightarrow x^TAx$ ; $D_x\phi:h\rightarrow h^TAx+x^TAh=x^T(A+A^T)h$. Moreover $<\nabla_x(f),h>=x^T(A+A^T)h$, that is ${\nabla_x(f)}^Th=x^T(A+A^T)h$. By identification, $\nabla_x(f)=(A+A^T)x$, a vector (formula (89) in the detestable matrix Cookbook !) ; in particular, the solution above $x^T(A+A^T)$ is not a vector !