In-place inversion of large matrices
Solution 1:
We suppose an (m+n) by (m+n) matrix $\textbf{A}$ with block partition:
$\textbf{A}=\begin{pmatrix}\textbf{E}&\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$
A sequence of steps can be outlined, invoking recursively an in-place matrix inversion and other steps involving in-place matrix multiplication.
Some but not all of these matrix multiplication operations would seem to require additional memory to be "temporarily" allocated in between recursive calls to the in-place matrix inversion. At bottom we will argue that needed additional memory is linear, e.g. size m+n.
1.
The first step is recursively to invert in-place matrix $\textbf{E}$:
$\begin{pmatrix}\textbf{E}^{-1}&\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$
Of course to make that work requires the leading principal minor $\textbf{E}$ to be nonsingular.
2.
The next step is to multiply $\textbf{E}^{-1}$ times block submatrix $\textbf{F}$ and negate the result:
$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}&\textbf{H}\end{pmatrix}$
Note that the matrix multiplication and overwriting in this step can be performed one column of $\textbf{F}$ at a time, something we'll revisit in assessing memory requirements.
3.
Next multiply $\textbf{G}$ times the previous result $-\textbf{E}^{-1}\textbf{F}$ and add it to the existing block submatrix $\textbf{H}$:
$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}&\textbf{H}-\textbf{G}\textbf{E}^{-1}\textbf{F}\end{pmatrix}$
This step can also be performed one column at a time, and because the results are accumulated in the existing block submatrix $\textbf{H}$, no additional memory is needed.
Note that what has now overwritten $\textbf{H}$ is $\textbf{S}=\textbf{H}-\textbf{G}\textbf{E}^{-1}\textbf{F}$.
4. & 5.
The next two steps can be done in either order. We want to multiply block submatrix $\textbf{G}$ on the right by $\textbf{E}^{-1}$, and we also want to recursively invert in-place the previous result $\textbf{S}$. After doing both we have:
$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ \textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$
Note that the matrix multiplication and overwriting of $\textbf{G}$ can be done one row at a time.
6.
Next we should multiply these last two results, overwriting $\textbf{G}\textbf{E}^{-1}$ with $\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}$ and negating that block:
$\begin{pmatrix}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$
7.
We are on the home stretch now! Multiply the two off-diagonal blocks and add that to the diagonal block containing $\textbf{E}^{-1}$:
$\begin{pmatrix}\textbf{E}^{-1}+\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$
8.
Finally we multiply the block containing $-\textbf{E}^{-1}\textbf{F}$ on the right by $\textbf{S}^{-1}$, something that can be done one row at a time:
$\textbf{A}^{-1}=\begin{pmatrix}\textbf{E}^{-1}+\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&-\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}\\\\ -\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}&\textbf{S}^{-1}\end{pmatrix}$
Requirements for additional memory:
A temporary need for additional memory arises when we want to do matrix multiplication and store the result back into the location of one of the two factors.
Such a need arises in step 2. when forming $\textbf{E}^{-1}\textbf{F}$, in step 4. (or 5.) when forming $\textbf{G}\textbf{E}^{-1}$, in step 6. when forming $\textbf{S}^{-1}\textbf{G}\textbf{E}^{-1}$, and in step 8. when forming $-\textbf{E}^{-1}\textbf{F}\textbf{S}^{-1}$.
Of course other "temporary" allocations are hidden in recursive calls to in-place inversion in step 1. and step 4.&5. when matrices $\textbf{E}$ and $\textbf{S}$ are inverted.
In each case the allocated memory can be freed (or reused) after one column or one row of a required matrix multiplication is performed, because the overwriting can be done one column or one row at a time following its computation. The overhead for such allocations is limited to size m+n, or even max(m,n), i.e. a linear overhead in the size of $\textbf{A}$.