Why is Householder computationally more stable than modified Gram-Schmidt?

I'm having trouble nailing down why using Householder transformations yields a more stable/accurate result than the modified Gram-Schmidt method when computing the QR decomposition of a matrix. Can anyone explain?

Thanks


Both algorithms compute a QR factorisation of a matrix $A$, that is, $A=QR$. Due to rounding errors, we have $A=\tilde{Q}\tilde{R}+E$, where $\tilde{Q}$ is no longer orthogonal (but close to in some sense). The residual $E$ is not usually the problem, for both Householder and Gram-Schmidt (even the classic one), $\|E\|\leq c\epsilon\|A\|$, where $c$ depends only on the dimension of $A$, $\epsilon$ is the machine precision. Here, $\|\cdot\|$ is some "rounding errors friendly" norm, e.g., any $p$-norm (usually $1$, $2$, or $\infty$).

The main difference is in the accuracy of the orthogonal factor. The Householder orthogonalisation gives $\tilde{Q}$ which is almost orthogonal in the sense that $\|I-\tilde{Q}^T\tilde{Q}\|\leq c\epsilon$. In the modified Gram-Schmidt, this "loss of orthogonality" is proportional to the condition number of $A$, that is, MGS yields $\|I-\tilde{Q}^T\tilde{Q}\|\leq c\epsilon\kappa_2(A)$ (note that the Q-factor from the classical Gram-Schmidt satisfies a similar bound with the condition number of $A$ squared).

The difference between Householder and MGS is, roughly speaking, due to the fact that Householder computes the Q-factor as a product of accurate Householder reflections while MGS directly orthogonalises the columns of $A$.

By the way, both Householder and MGS compute an accurate R-factor $\tilde{R}$ in the sense of backward error. For both, there is an (exact) orthogonal matrix $\hat{Q}$ and a residual matrix $\hat{E}$ (of course different for each) such that $\|\hat{E}\|\leq c\epsilon\|A\|$ and $A=\hat{Q}\tilde{R}+\hat{E}$.