Partial derivatives of $A^n x$ with respect to the entries of $A$

Let $A \in \mathbb{R}^{n \times n}$ and $x \in \mathbb{R}^{n \times 1}$. I am wondering what the partial derivative of each entry of $A^n x$ with respect to the entries of $A$ is. Is there a closed form expression for this?

$$\frac{\partial}{\partial A_{i,j}} A^nx$$

I wasn't able to find an answer.

Solution 1:

Solution

Letting $e_k$ be $k$-th standard basis vector, the derivative is $$ \boxed{ \frac{\partial}{\partial A_{ij}} \left[A^{n}x\right]=\left(\sum_{k=0}^{n-1}A^{k}e_i e_j^\intercal A^{n-k-1}\right)x } $$

Derivation

For brevity, define $E=e_{i}e_{j}^{\intercal}$ and $\partial_{ij}\equiv\partial / \partial A_{ij}$. Then, \begin{align*} \partial_{ij}\left[A^{n}\right] & =\partial_{ij}\left[AA^{n-1}\right]\\ & =\partial_{ij}\left[A\right]A^{n-1}+A\partial_{ij}\left[A^{n-1}\right]\\ & =EA^{n-1}+A\partial_{ij}\left[A^{n-1}\right]. \end{align*} Therefore, by induction, \begin{align*} \partial_{ij}\left[A^{n}\right] & =EA^{n-1}+A\partial_{ij}\left[A^{n-1}\right]\\ & =EA^{n-1}+AEA^{n-2}+A^{2}\partial_{ij}\left[A^{n-2}\right]\\ & =\cdots\\ & =EA^{n-1}+AEA^{n-2}+\cdots+A^{n-1}EA^{n-n}+A^{n}\partial_{ij}\left[A^{n-n}\right]\\ & =EA^{n-1}+AEA^{n-2}+\cdots+A^{n-1}E\\ & =\sum_{k=0}^{n-1}A^{k}EA^{n-k-1}. \end{align*}

Verification

Here is some code to numerically verify (with finite differences) 1000 random instances. Note that in the code, the matrix powers are not computed efficiently.

import numpy as np
from numpy.linalg import matrix_power as mpow
import pytest

n_params = 1000
params = zip(
    np.random.randint(1, high=10, size=n_params),
    np.random.randint(1, high=4, size=n_params),
    np.random.uniform(size=n_params),
    np.random.uniform(size=n_params),
)
params = list(params)


@pytest.mark.parametrize('m,n,i_frac,j_frac', params)
def test_formula(m, n, i_frac, j_frac, h=1e-6, rtol=1e-3):
    i = int(i_frac * m)
    j = int(j_frac * m)

    A = np.random.randn(m, m)
    x = np.random.randn(m)

    e_i = np.zeros(m)
    e_j = np.zeros(m)
    e_i[i] = 1.
    e_j[j] = 1.
    E = np.outer(e_i, e_j)

    # Compute derivative using formula.
    D = sum(mpow(A, i) @ E @ mpow(A, n - i - 1) for i in range(0, n))
    exact = D @ x

    # Compute derivative numerically.
    delta = mpow(A + h * E, n) @ x - mpow(A, n) @ x
    approx = delta / h

    np.testing.assert_allclose(exact, approx, rtol=rtol)

Solution 2:

Consider the differential \begin{eqnarray*} d\mathbf{u} &=& (d\mathbf{A}^n) \mathbf{x} = \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} (d\mathbf{A}) \mathbf{A}^{n-1-k} \right] \mathbf{x} \end{eqnarray*} Since $\mathbf{A}= \sum_{i,j} A_{ij} \mathbf{E}_{ij}$, we simply write $ d\mathbf{A} = (dA_{ij}) \mathbf{E}_{ij} $ where $ \mathbf{E}_{ij} = \mathbf{e}_i \mathbf{e}_j^T$. Finally $$ d\mathbf{u} = dA_{ij}\cdot \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} \mathbf{E}_{ij} \mathbf{A}^{n-1-k} \right] \mathbf{x} $$ which gives the derivative $$ \frac{\partial \mathbf{u}}{\partial A_{ij}} = \left[ \sum_{k=0}^{n-1} \mathbf{A}^{k} \mathbf{E}_{ij} \mathbf{A}^{n-1-k} \right] \mathbf{x} $$