In-place inversion of large matrices

Solution 1:

We suppose an (m+n) by (m+n) matrix $\textbf{A}$ with block partition:

$\textbf{A}=\begin{pmatrix}\textbf{E}&\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$

A sequence of steps can be outlined, invoking recursively an in-place matrix inversion and other steps involving in-place matrix multiplication.

Some but not all of these matrix multiplication operations would seem to require additional memory to be "temporarily" allocated in between recursive calls to the in-place matrix inversion. At bottom we will argue that needed additional memory is linear, e.g. size m+n.

1. The first step is recursively to invert in-place matrix $\textbf{E}$:

$\begin{pmatrix}\textbf{E}^{-1}&\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$

Of course to make that work requires the leading principal minor $\textbf{E}$ to be nonsingular.

2. The next step is to multiply $\textbf{E}^{-1}$ times block submatrix $\textbf{F}$ and negate the result:

$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$

Note that the matrix multiplication and overwriting in this step can be performed one column of $\textbf{F}$ at a time, something we'll revisit in assessing memory requirements.

3. Next multiply $\textbf{G}$ times the previous result $-\textbf{E}^{-1}\textbf{F}$ and add it to the existing block submatrix $\textbf{H}$:

$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}&\textbf{H}-\textbf{G}\textbf{E}^{-1}\textbf{F}\end{pmatrix}$

This step can also be performed one column at a time, and because the results are accumulated in the existing block submatrix $\textbf{H}$, no additional memory is needed.

Note that what has now overwritten $\textbf{H}$ is $\textbf{S}=\textbf{H}-\textbf{G}\textbf{E}^{-1}\textbf{F}$.

4. & 5. The next two steps can be done in either order. We want to multiply block submatrix $\textbf{G}$ on the right by $\textbf{E}^{-1}$, and we also want to recursively invert in-place the previous result $\textbf{S}$. After doing both we have:

$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$

Note that the matrix multiplication and overwriting of $\textbf{G}$ can be done one row at a time.

6. Next we should multiply these last two results, overwriting $\textbf{G}\textbf{E}^{-1}$ with $\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}$ and negating that block:

$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$

7. We are on the home stretch now! Multiply the two off-diagonal blocks and add that to the diagonal block containing $\textbf{E}^{-1}$:

$\begin{pmatrix}\textbf{E}^{-1}+\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$

8. Finally we multiply the block containing $-\textbf{E}^{-1}\textbf{F}$ on the right by $\textbf{S}^{-1}$, something that can be done one row at a time:

$\textbf{A}^{-1}=\begin{pmatrix}\textbf{E}^{-1}+\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$

Requirements for additional memory:

A temporary need for additional memory arises when we want to do matrix multiplication and store the result back into the location of one of the two factors.

Such a need arises in step 2. when forming $\textbf{E}^{-1}\textbf{F}$, in step 4. (or 5.) when forming $\textbf{G}\textbf{E}^{-1}$, in step 6. when forming $\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}$, and in step 8. when forming $-\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}$.

Of course other "temporary" allocations are hidden in recursive calls to in-place inversion in step 1. and step 4.&5. when matrices $\textbf{E}$ and $\textbf{S}$ are inverted.

In each case the allocated memory can be freed (or reused) after one column or one row of a required matrix multiplication is performed, because the overwriting can be done one column or one row at a time following its computation. The overhead for such allocations is limited to size m+n, or even max(m,n), i.e. a linear overhead in the size of $\textbf{A}$.