What is the derivative of $\sum_{i,j}^n \langle A^{ij}x_i,x_j \rangle$

Solution 1:

Define $g:(\mathbb{R}^n)^n\rightarrow\mathbb{R}$ as $$g(x_1,\ldots,x_n)=\sum_{1\leq i,j\leq n}\langle A^{ij}x_i,x_j\rangle$$ where $A^{i,j}\in\operatorname{Mat}_{n\times n}(\mathbb{R})$, $1\leq i,j\leq n$. Then $$\begin{align} g(x_1+h_1,\ldots, x_n+h_n)&=\sum_{1\leq i,j\leq n}\langle A^{ij}(x_i + h_i,x_j+h_j\rangle\\ &=\sum_{1\leq i,j\leq n}\langle A^{ij}x_i,x_j\rangle + \sum_{1\leq i,j\leq n}\langle A^{ij} x_i,h_j\rangle + \sum_{1\leq i,j\leq n}\langle A^{ij} h_i,x_j\rangle + \sum_{1\leq i,j\leq n}\langle A^{ij} h_i, h_jx\rangle\\ &= g(x) +\sum_{1\leq i,j\leq n}\langle A^{ij} x_i,h_j\rangle + \sum_{1\leq i,j\leq n}\langle A^{ij} h_i,x_j\rangle + r(h) \end{align}$$ where $\frac{|r(h)|}{\|h\|}\xrightarrow{h\rightarrow0}0$ where $\|h\|=\|h_1\|+\ldots+\|h_n\|$. This means that $g'(x)$, $x=(x_1,\ldots,x_n)$ is the linear operator $$\begin{align} g'(x)[h_1,\ldots,h_n]&= \sum^n_{i=1}\sum^n_{j=1}\langle A^{ij} x_i,h_j\rangle +\sum^n_{i=1}\sum^n_{j=1}\langle A^{ij} h_i,x_j\rangle \\ &=\sum^n_{i=1}\sum^n_{j=1}\langle A^{ij} x_i,h_j\rangle + \sum^n_{i=1}\sum^n_{j=1}\langle (A^{ij})^*x_j,h_i\rangle \\ &=\sum^n_{i=1}\sum^n_{j=1}\big\langle \big(A^{ij}+(A^{ji})^*\big)x_i,h_j\big\rangle \end{align} $$ where, for any matrix $A^*$ is the adjoint of $A$ relative to the inner product $\langle\cdot,\cdot\rangle$.

If $A^{ij}=A^{ji}$ then $$ \begin{align} g'(x)[h_1,\ldots,h_n]&=2\sum^n_{i=1}\sum^n_{j=1}\langle \frac{1}{2}(A^{ij}+(A^{ij})^*)x_i,h_j\rangle \end{align} $$ The matrix $\frac12(A_{ij}+A^*_{ij})$ are the symetrization of $A^{ij}$ with respect to the inner product $\langle\cdot,\cdot\rangle$.

Solution 2:

In my opinion this exercise teaches an important lesson. Namely:

Whenever possible try to work without reference to coordinates!

This is used twofold in this exercise. First, by using a coordinate-free definition of the differential and second by using a coordinate-free description of your function $g$.

Let us first review the coordinate-free description of the differential, aka the 3-term expansion:

Let $g: U \subset \mathbb{R}^n \to \mathbb{R}^m$ be a map and $p \in U$. The map $g$ is called differentiable at $p$ if locally around $p$ it can be written as \begin{equation} g(p+h)=g(p)+(Dg)_p(h)+R_g(p,h) \end{equation} where $(Dg)_p(h)$ is linear in $h$ and the rest-term $R_g(p,h)$ satisfies $\lim_{h \to 0}\frac{||R_g(p,h)||}{||h||}=0$. The linear map $(Dg)_p$ is called the differential of $g$ at $p$.

More specifically, if the map $g$ is real valued, then the differential $(Dg)_p$ is a linear map from $\mathbb{R}^n$ to $\mathbb{R}$. Recall that any real valued linear map can be uniquely written as the inner product with a certain vector. Therefore, there exists a vector $v_p$ so that \begin{equation} (Dg)_p(h)=\langle v_p, h \rangle \end{equation} for all $h \in \mathbb{R}^n$. This $v_p$ is just the gradient $\nabla g(p)$ of $g$ at $p$. So this is a coordinate-free description of the gradient!

Let‘s now come to our specific function. The function $g$ can be written as \begin{equation} g(x)=\langle Ax, x \rangle, \end{equation} where $\langle \cdot, \cdot \rangle$ is the standard inner product of $\mathbb{R}^n$. By linearity of $A$ and bilinearity of the inner product we get \begin{align*} g(p+h) =& \langle A(p+h), p+h \rangle \\ =& \langle Ap, p\rangle \\ &+ \langle Ap , h\rangle \\ &+ \langle Ah, p\rangle \\ &+ \langle Ah , h\rangle \\ =& g(p) \\ &+ \langle Ap,h\rangle + \langle h , A^\ast p \rangle \\ &+ \langle Ah, h\rangle \\ =& g(p) \\ &+ \langle (A+A^\ast)p,h\rangle \\ &+ \langle Ah, h\rangle, \end{align*} where $A^\ast$ is the adjoint of $A$. The term $\langle (A+A^\ast)p,h\rangle $ is clearly linear in $h$. The last term is quadratic in $h$ and thus satisfies $\lim_{h \to 0}\frac{| \langle Ah,h \rangle |}{||h||}=0$. Therefore, this is a 3-term expansion. Thus \begin{equation} (Dg)_p(h)=\langle (A+A^\ast)p, h\rangle. \end{equation} By the above coordinate-free description of the gradient this shows \begin{equation} \nabla g(p)=(A+A^\ast)p. \end{equation} The assumption that the matrix $A$ is symmetric just means that $A^\ast=A$. Thus $\nabla g(p)=2Ap$.

Note that I also used the coordinate-free description of the adjoint of a linear map $A$. The adjoint $A^\ast$ of $A$ is defined to be the unique linear map so that \begin{equation} \langle Av,w \rangle =\langle v,A^\ast w \rangle \end{equation} holds for all $v,w \in \mathbb{R}^n$.

I hope this helps you to see that once you know the coordinate-free definitions, working without reference to coordinates is much simpler than writing out everything very explicitly in coordinates.

Solution 3:

$ \def\bbR#1{{\mathbb R}^{#1}} \def\o{{\large\tt1}} \def\e{\varepsilon} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{c}#1\end{array}\right]} \def\qiq{\quad\implies\quad} $Assume the dimensions of the given vectors and matrices to be $$x_{i}\in\bbR{p},\qquad A_{ij}\in\bbR{p\times p}$$ Let $\e_i\in\bbR{n}$ denote the cartesian basis vectors, and use the identity matrix $I\in\bbR{p\times p}$ to construct matrix analogs of these basis vectors $$ E_i = \e_i\otimes I \qiq E_j^TE_k = \delta_{jk}I \qiq E_k^TE_k = I $$ The indexed matrices are blocks of a large partitioned matrix, just as the indexed vectors are subvectors of a larger vector $$\eqalign{ M &= \sum_{i=1}^n\sum_{j=1}^n E_iA_{ij}E_j^T = \m{ A_{11}&A_{12}&\ldots&A_{1n} \\ A_{21}&A_{22}&\ldots&A_{2n} \\ \vdots&\vdots&\ddots&\vdots \\ A_{n1}&A_{n2}&\ldots&A_{nn} \\ }\qquad z = \sum_{i=1}^n E_ix_i = \m{x_1\\x_2\\\vdots\\x_n} \\ }$$ Note that the $\{E_k\}$ basis can also be used to extract blocks from a partitioned matrix, e.g. $$\eqalign{ A_{k\ell} &= E_k^TME_\ell &\qiq A_{k\ell}^T = E_\ell^TM^TE_k \\ x_k &= E_k^Tz &\qiq x_k^T = z^TE_k \\ Mz &= \sum_{i=1}^n\sum_{j=1}^n E_iA_{ij}x_j &\qiq M^Tz = \sum_{i=1}^n\sum_{j=1}^n E_iA_{ji}^Tx_j \\ z^TMz &= \sum_{i=1}^n\sum_{j=1}^n x_i^TA_{ij}x_j \\ }$$

Write the objective function using the partitioned matrix.
Then calculate its differential and gradient $$\eqalign{ \phi &= z^TMz \\ d\phi &= z^T\LR{M+M^T}dz \\ &= z^T\LR{M+M^T} \BR{E_k\,dx_k} \\ &= \BR{E_k^T\LR{M+M^T}z}^T {dx_k} \\ \grad{\phi}{x_k} &= {E_k^T\LR{M+M^T}z} \;\;\equiv\;\; \sum_{j=1}^n \LR{ A_{kj} + A_{jk}^T } x_j \\ }$$