Product Rule for vector output functions

If you're still interested, you can define a "generalised product rule" even when the target space of your functions is a vector space (it can even be infinite dimensional).

Theorem. (See Loomis and Sternberg)

Let $U,V,W, X$ be normed vector spaces. Let $g: U \to V$, and $h: U \to W $ be functions which are differentiable at a point $\beta \in U$. Let $\omega: V \times W \to X$ be a bounded bilinear map. With these assumptions, the composite function $F : U \to X$ defined by \begin{equation} F(\xi) = \omega(g(\xi), h(\xi)) \end{equation} is differentiable at $\beta$ and its derivative at $\beta$ (which is a linear map from $U$ into $X$) is given by the formula \begin{align} dF_{\beta}(\cdot) = \omega(dg_{\beta}(\cdot), h(\beta)) + \omega(g(\beta), \tag{*}dh_{\beta}(\cdot)), \end{align} i.e for all $x \in U$, we have \begin{equation} dF_{\beta}(x) = \omega(dg_{\beta}(x), h(\beta)) + \omega(g(\beta), dh_{\beta}(x)). \end{equation}

As a remark, Spivak would use the notation $DF(\beta)$ to mean the derivative of $F$ at $\beta$, but I used $dF_{\beta}$ (as in Loomis and Sternberg), simply to avoid a lot of brackets when evaluating this linear transformation at a certain point $x$. But, aside from this slight notation change, in this formula, we like to think of $\omega$ as "multiplication" so that $F(\xi)$ is the "product" of $g(\xi)$ and $h(\xi)$. Notice how the derivative formula (*) takes the nice form "differentiate the first keep the second $+$ keep the first, differentiate the second". To read more about this, I HIGHLY recommend the book Advanced Calculus by Loomis and Sternberg. This theorem is in fact Theorem $8.4$ of Chapter $3$ (with slightly different notation).


Addition in Response to OP's comment:

Let $A \in M_{n \times n}(\mathbb{R})$. Then, for $x \in \mathbb{R^n}$, the quantity $x^t A x$ can be written using the standard inner product of $\mathbb{R^n}$ as $\langle Ax, x \rangle$. So, what we're doing is considering the following functions (mimicing the notation above):

  • $g: \mathbb{R^n} \to \mathbb{R^n}$, defined by $g(x) = Ax$
  • $h = id_{\mathbb{R^n}}$.
  • $\omega = \langle , \rangle$
  • $F: \mathbb{R^n} \to \mathbb{R}$, defined by \begin{align} F(x) = \langle Ax, x \rangle = \langle g(x), h(x) \rangle \end{align} Note that $g$ and $h$ are both linear transformations so, for any $x \in \mathbb{R^n}$, we have $dg_x = g$ and $dh_x = h$. If this isn't clear to you, see my answer here. Hence, by the formula above, \begin{align} dF_x(\eta) &= \langle dg_x(\eta), h(x) \rangle + \langle g(x), dh_x(\eta) \rangle \\ &= \langle g(\eta), x \rangle + \langle g(x), h(\eta) \rangle \\ &= \langle A \eta, x\rangle + \langle Ax, \eta \rangle \\ &= \langle \eta, A^t x \rangle + \langle Ax, \eta \rangle \\ &= \langle (A^t + A)x, \eta\rangle, \end{align}

where in the last line, I made use of symmetry and bilinearity of the inner product. Now, you may be wondering, since $F$ maps $\mathbb{R^n}$ to $\mathbb{R}$, how might we calculate its partial derivatives? The relationship is very simple (see the book I linked above, Section 3.8 for more details): \begin{align} (\partial_i F)(x) &= dF_x(e_i) \\ &= \langle (A^t + A)x, e_i \rangle \end{align} where $\partial_i F(x)$ is what you might normally write as $\dfrac{\partial F}{\partial x_i}(x)$, and $e_i = (0, \dots, 1, \dots, 0) \in \mathbb{R^n}$, with $1$ in the $i^{th}$ place.