Find the gradient and hessian of $f(Ax+b)$ for real value $f$ and matrix $A$

Let $A\in\mathbb{R}^{m\times n}$, $b\in \mathbb{R}^m$. For $x\in\mathbb{R}^n$, we define $q(x) = f(Ax+b)$ with $f:\mathbb{R}^m\to\mathbb{R}$. Find the gradient and hessian of the function $q$.

This question is kinda strange. If I were to take the jacobian, I would just compose the jacobian of the outer funciton with the jacobian of the inner function. Now, how do I take the partial derivative of $q$?

$$\frac{\partial f(Ax+b)}{\partial x_1} = \lim_{h\to 0}\frac{f(A(x_1+h,x_2,\cdots,x_n) + b) - f(Ax+b)}{h}$$

I don't think it helps in thinking this way. I have no means of finding this limit without using some chain rule or so. Maybe I can apply the chain rule to $q$, but how?

UPDATE:

By the hint given below,

$$q(x_1,\dots, x_n)=f(f_1,\cdots,f_n) = f\left(\sum_{i=1}^n a_{1i}x_i+b_1,\dots,\sum_{i=1}^n a_{mi}x_i+b_m\right)$$

I think the multivariable chain rule can be applied:

$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}\frac{\partial f_1}{\partial x_1} + \cdots + \frac{\partial f}{\partial f_n}\frac{\partial f_n}{\partial x_1}$$

And see that

$$\frac{\partial f_1}{\partial x_1} = a_{11}\\\cdots\\\frac{\partial f_n}{\partial x_1} = a_{m1}$$

So we get

$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}a_{11} + \cdots + \frac{\partial f}{\partial f_n}a_{m1}$$

In general:

$$\frac{\partial f}{\partial x_j} = \frac{\partial f}{\partial f_1}a_{1j} + \cdots + \frac{\partial f}{\partial f_n}a_{mj}$$

I think there's still a lot of work to do.


Background knowledge: if $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $F'(x)$ is an $m \times n$ matrix.


Let $g(x) = f(Ax + b)$. By the chain rule, $$ g'(x) = f'(Ax + b)A. $$ If we use the convention that the gradient is a column vector, then $$ \nabla g(x) = g'(x)^T = A^T \nabla f(Ax + b). $$ The Hessian of $g$ is the derivative of the function $x \mapsto \nabla g(x)$. By the chain rule, $$ \nabla^2 g(x) = A^T \nabla^2 f(Ax + b) A. $$


Hint

We have that

$$q(x_1,\dots, x_n)=f\left(\sum_{i=1}^n a_{1i}x_i+b_1,\dots,\sum_{i=1}^n a_{mi}x_i+b_m\right). $$

Thus we have

$$\dfrac{\partial q}{\partial x_i}=a_{1i}\dfrac{\partial f}{\partial u_1}+\dots +a_{mi}\dfrac{\partial f}{\partial u_m}.$$ That is

$$(\nabla q(x))^T=A(\nabla f q(x))^T.$$

Edit to get the Hessian

\begin{align} \dfrac{\partial^2 q}{\partial x_j\partial x_i} &=\sum_{k=1}^m a_{ki}\dfrac{\partial}{\partial x_j}\left( \dfrac{\partial f}{\partial u_k}\right) \\&= \sum_{k=1}^m a_{ki}\sum_{l=1}^ma_{lj}\dfrac{\partial^2 f}{\partial u_l\partial u_k} \\&= \sum_{k,l=1}^ma_{ki}a_{lj} \dfrac{\partial^2 f}{\partial u_l\partial u_k}. \end{align}

We have used that

$$\dfrac{\partial}{\partial x_j}\dfrac{\partial f}{\partial u_k}=\dfrac{\partial}{\partial u_1}\left(\dfrac{\partial f}{\partial u_k}\right)\dfrac{\partial u_1}{\partial x_j}+\dots +\dfrac{\partial}{\partial u_m}\left(\dfrac{\partial f}{\partial u_k}\right)\dfrac{\partial u_m}{\partial x_j}$$

Thus, whe have that

$$\nabla^2 q (x)=A^T (\nabla^2 f(u)) A.$$


Let $\phi = x \mapsto Ax + b$ such that $q = f \circ \phi$.
Denote by $J_g$ the Jacobian matrix of any function $g$.

Applying the chain rule leads to $J_q (x) = J_f(\phi(x)) J_{\phi}(x) $

Since $J_{\phi}(x) = A$, and $\nabla g = (J_g)^T$ for any scalar function $g$, this boils down to $$(\nabla q (x))^T = (\nabla f (Ax+b) )^TA.$$

Finally, we find: $$\nabla q (x) = A^T \nabla f (Ax+b).$$

If you want a more gradient-specific formula, you can directly state that $\nabla q (x)= J_{\phi}(x)^T \nabla f (\phi(x))$.