A "geometric'' infinite sum of matrices

The sum $$ I + A + A^2 + A^3 + \cdots $$ equals $$(I-A)^{-1}$$ under the assumption $\rho(A)<1$, which is necessary to make the sum converge. My question: what does the sum $$ I + A^T A + (A^2)^T A^2 + (A^3)^T A^3 + \cdots$$ equal under the same assumption? Is there a similarly neat expression?

It is true that this last sum converges under the assumption $\rho(A)<1$. The ``obvious'' guess $(I-A^T A)^{-1}$ for the sum is not true because $ (A^2)^T A^2 \neq (A^T A)^2$; in fact, this guess does not even make sense because $\rho(A)<1$ does not rule out $\rho(A^T A)=1$.


FWIW: Call $S$ the sum, provided it makes sense. Then $A^TSA=S-I$, so the series converges to a root of this equation.

If $n=2$, then the determinant of the endomorphism $S\mapsto A^TSA-S$ of $M_n(\mathbb C)$ is $(\det A-1)^2\det(A^2-I)$, so for most $A$s, the solution is unique. For larger $n$, I have no idea.


If $F:M_n\to M_n$ is defined by $F(X)=A^\mathrm{T}XA$ and $\mathrm{id}:M_n\to M_n$ is the identity map, then your sum is $(\mathrm{id} - F)^{-1}(I)$.


Let $S$ be the solution to the discrete-time Lyapunov equation $A^TSA=S-I$. Note that $S$ exists and is unique because $$ A^TSA=S-I\ \Leftrightarrow\ (I - A^T\otimes A^T)\textrm{vec}(S)=\textrm{vec}(I) $$ and $I - A^T\otimes A^T$ is invertible owing to the fact that $\rho(A^T\otimes A^T)=\rho(A)^2<1$.

It follows that if the infinite series $I + A^T A + (A^2)^T A^2 + (A^3)^T A^3 + \cdots$ converges at all, its limit must be $S$.

But does it really converge? Note that for some matrix $A$ such as $\frac1{\sqrt{2}}\pmatrix{1&1\\ 0&1}$ we have $\rho(A)<1$ but also $\max\left\{\|A\|,\|A^T\|\right\}>1$ for every submultiplicative matrix norm. So, we cannot argue that $\|A\|\|A^T\|<1$ and pretend that $\|\sum_{k=0}^\infty(A^j)^TA^j\|\le\sum_{j=0}^\infty(\|A\|\|A^T\|)^j<\infty$ (the first inequality is always valid but the second one is not). The usual trick for proving the convergence of a Neumann series using submultiplicative norm is not applicable here.

Yet, we can still easily prove that the partial sums are convergent to $S$. Since $A^TSA=S-I$, one can prove by mathematical induction that $$ S-\sum_{j=0}^{n-1}(A^j)^TA^j=(A^n)^TSA^n $$ for every $n\ge1$. As $\rho(A)<1$, we have $A^n\rightarrow0$ and in turn $(A^n)^TSA^n\rightarrow0$ when $n$ approaches infinity. Hence the infinite series $\sum_{j=0}^\infty(A^j)^TA^j$ does converge to $S$.

Alternatively, we can vectorise the $n$-th partial sum of the infinite series as follows: $$ \operatorname{vec}\left(\sum_{j=0}^n(A^j)^T A^j\right) =\sum_{j=0}^n\left((A^j)^T \otimes (A^j)^T\right)\operatorname{vec}(I) =\sum_{j=0}^n\left(A^T\otimes A^T\right)^j\operatorname{vec}(I).\tag{$\ast$} $$ Now we get a geometric sum on the RHS of $(\ast)$. This effectively resurrects the Neumann series argument. As $\rho\left(A^T\otimes A^T\right)=\rho(A)^2<1$, we have $\|A^T\otimes A^T\|<1$ for some submultiplicative norm. The RHS of $(\ast)$ thus converges when $n\to\infty$ and the LHS converges too.


[update 4] We get an exact result

Here I give a worked example using Pari/GP. I adapted the letters for the matrices, the previously used notation was not very elegant....

We want that

$$ \small S = I + A \cdot A^\tau + A^2 \cdot (A^\tau)^2 + A^3 \cdot (A^\tau)^3 + ... $$ from where also $$ \small S = I + A \cdot S \cdot A^\tau $$

The matrix A can be very general; by use of diagonalization of A and the closed form for the geometric series applied to the eigenvalues of A we get an exact result.

[update 5] If A is symmetric, then the process simplifies again: The diagonalization $\small A=W\cdot D \cdot W^{-1} $ gives a matrix W which can be normed to be a rotation-matrix, thus $ \small M=W^{-1}=W^\tau$ , then $ \small Mm = M \cdot M^\tau = I $ , and thus Gg is diagonal with the entries $ \small g_{k,k}= 1/( 1-\lambda_k^2)$ where $\small \lambda_k$ are the eigenvalues (their squares may not be 1 now, but the relation of pairs is now irrelevant). The result is then simply $ \small S = W \cdot Gg \cdot W^\tau $
[/update 5]


 
 \\ Pari/GP 
 d = 4 
 A = matrix(d,d,r,c,c^r)   \\ create some diagonizable matrix
 W = mateigen(A)
    M = W^-1
    D = diag(M*A*W);
    mydisplay([W, Mat(D)]) \\ user function for compact output 

 Mm = M*M~

      \\ Now the closed-form for the geometric series of the
      \\ eigenvalues come into play. 
      \\ Note that the eigenvalues( in D) can be larger than 1, 
      \\ but no pair is allowed to give a product of 1    
 Gg = matrix(d,d, r,c, Mm [r,c] / (1 - D[r] * D[c] )  )

 S  = W*Gg*W~              \\ exact result
          \\ then crosscheck by implicite definition             
      chk = matid(d) + A * S * A~  
      err = round(S - chk,&e) \\ difference is zero  

The example-matrices:

$ \qquad \small \begin{array} {lll} A &=& \begin{array} {rrrrr} 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ 1 & 8 & 27 & 64 \\ 1 & 16 & 81 & 256 \end{array} \\ \\ \\ W ,D &=& \begin{array} {rrrr|r} -10.952794 & 8.1645134 & -0.97056263 & 0.017714558 & 0.11272446 \\ 10.508537 & 4.2521778 & -1.8354857 & 0.066926874 & 1.0292480 \\ -5.0996430 & -4.0885199 & -2.6756811 & 0.25725908 & 8.9314971 \\ 1 & 1 & 1 & 1 & 277.92653 \end{array} \\ \\ \\ Mm &=& \begin{array} {rrrr} 0.0039983491 & 0.00083487726 & -0.0016554169 & -0.00076571340 \\ 0.00083487726 & 0.010062383 & 0.0027137791 & -0.0041346567 \\ -0.0016554169 & 0.0027137791 & 0.081999971 & -0.013791508 \\ -0.00076571340 & -0.0041346567 & -0.013791508 & 0.93753657 \end{array} \\ \\ \\ Gg&=&\begin{array} {rrrr} 0.0040498092 & 0.00094445419 & 0.24350752 & 0.000025246807 \\ 0.00094445419 & -0.16953880 & -0.00033124251 & 0.000014504751 \\ 0.24350752 & -0.00033124251 & -0.0010409834 & 0.0000055581782 \\ 0.000025246807 & 0.000014504751 & 0.0000055581782 & -0.000012137628 \end{array} \\ \\ \\ S &=& \begin{array} {rrrr} -5.8030040 & -3.8986593 & 14.233205 & -4.3361544 \\ -3.8986593 & -11.925784 & -1.9019096 & 1.4489718 \\ 14.233205 & -1.9019096 & 3.9412187 & -1.2246860 \\ -4.3361544 & 1.4489718 & -1.2246860 & 0.32178997 \end{array} \\ \\ \\ chk &=& \begin{array} {rrrr} -5.8030040 & -3.8986593 & 14.233205 & -4.3361544 \\ -3.8986593 & -11.925784 & -1.9019096 & 1.4489718 \\ 14.233205 & -1.9019096 & 3.9412187 & -1.2246860 \\ -4.3361544 & 1.4489718 & -1.2246860 & 0.32178997 \end{array} \\ \\ \\ err &=& \begin{array} {rrrr} 0 & . & . & . \\ . & 0 & . & . \\ . & . & 0 & . \\ . & . & . & 0 \end{array} \end{array} $


(Below is the old text of my answer, just for reference and explanation of the principle:)
I've possibly another useful reformulation, don't know whether this fits the scope of the question. I assume that A is diagonalizable and do not consider a range of convergence - it's just a sketch.

For ease of notation I introduce $ \rm\ B = transpose(A) $, small letters denote the inverse of the related capital-letters denoted matrix. $ \rm\ I $ is the identity matrix.

We have from the statement of the problem $ \rm\ S = I + B S A $

If we introduce the diagonalization of A we have $ \rm\ A = W * D * w $ where D is diagonal. Use M and m for the transpose of W and w, then the diagonalization of B is $ \rm\ B = m * D * M $

So we have $$ S = I + BSA = I + mDM*S*WDw $$ and thus $$ MSW = MW + D MSW D $$

Let's denote $ \rm\ MSW = U$ and $ \rm\ MW = t$ Then this also means: $$ U = t + D U D = t + D t D + D^2tD^2 + D^3tD^3+... $$ Note that t is symmetric and thus U is symmetric and $ \rm\ S = mUw $ is then symmetric as well.

Because D is diagonal this can now be described more easily for each matrix-element individually. We have, using r and c for row and column-inbdex respectively $$ U_{r,c} = t_{r,c}*(1 + D_rD_c + (D_rD_c)^2 + ...) = t_{r,c}/(1-D_rD_c) $$ Having the matrix U by this, S is $$S = m * U * w $$ [update note: I edited the first version because I'd taken A and B in the wrong order]
[edit 2]: the denominators in the description of elements of U seem to describe the range of applicability. Since geometric series with quotient q is defined also for the divergent case (except for q=1) we might discard the condition $ \varrho(A) \lt 1 $ and replace it with something like : A cannot have at the same time a value x and its reciprocal as eigenvalues