Gradients of functions involving matrices and vectors, e.g., $\nabla_{w} w^{t}X^{t}y$ and $\nabla_{w} w^t X^tXw$

Let

$$f (\mathrm x) := \rm x^\top A \, x$$

Hence,

$$f (\mathrm x + h \mathrm v) = (\mathrm x + h \mathrm v)^\top \mathrm A \, (\mathrm x + h \mathrm v) = f (\mathrm x) + h \, \mathrm v^\top \mathrm A \,\mathrm x + h \, \mathrm x^\top \mathrm A \,\mathrm v + h^2 \, \mathrm v^\top \mathrm A \,\mathrm v$$

Thus, the directional derivative of $f$ in the direction of $\rm v$ at $\rm x$ is

$$\lim_{h \to 0} \frac{f (\mathrm x + h \mathrm v) - f (\mathrm x)}{h} = \mathrm v^\top \mathrm A \,\mathrm x + \mathrm x^\top \mathrm A \,\mathrm v = \langle \mathrm v , \mathrm A \,\mathrm x \rangle + \langle \mathrm A^\top \mathrm x , \mathrm v \rangle = \langle \mathrm v , \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x} \rangle$$

Lastly, the gradient of $f$ with respect to $\rm x$ is

$$\nabla_{\mathrm x} \, f (\mathrm x) = \color{blue}{\left(\mathrm A + \mathrm A^\top\right) \,\mathrm x}$$


By the definition of what is to be the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^ty= \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j}\in\mathbb{R} $$ we have $$ \nabla_w \big( w^tX^ty \big) = \left( \frac{\partial}{\partial w_{11}} ( w^tX^ty ), \frac{\partial}{\partial w_{21}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{i1}} ( w^tX^ty ), \ldots, \frac{\partial}{\partial w_{21}}( w^tX^ty ), \right) $$ For $i_0=1,2,\ldots,n$; \begin{align} \frac{\partial}{\partial w_{i_0}} ( w^tX^ty ) =& \frac{\partial}{\partial w_{i_01}} \left( \sum_{i=1}^n\sum_{j=1}^m w_{i1}\cdot X_{ji}\cdot y_{1j} \right) \\ =& \sum_{i=1}^n\sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i1}\cdot X_{ji}\cdot y_{1j}) \\ =& \sum_{j=1}^m \frac{\partial}{\partial w_{i_01}} (w_{i_01}\cdot X_{ji_0}\cdot y_{1j}) \\ =& \sum_{j=1}^m X_{ji_0}\cdot y_{1j} \\ \end{align} Then $$ \nabla_w \big( w^tX^ty \big) = \left( \sum_{j=1}^m X_{j1}\cdot y_{1j}, \sum_{j=1}^m X_{j2}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{ji_0}\cdot y_{1j}, \ldots, \sum_{j=1}^m X_{jn}\cdot y_{1j}, \right) $$ With similar calculations, we get the gradient vector of the application $$ \mathbb{R}^{n\times 1}\ni w \mapsto w^tX^tXw= \sum_{1\leq k\leq m} w_{1k}^2\cdot X_{k k}^2 + 2\sum_{1\leq k<\ell \leq m} w_{1k}\cdot X_{\ell k}\cdot X_{k\ell}\cdot w_{1\ell} \in\mathbb{R}. $$