Comparing LU or QR decompositions for solving least squares

Let $X \in R^{m\times n}$ with $m>n$. We aim to solve $y=X\beta$ where $\hat\beta$ is the least square estimator. The least squares solution for $\hat\beta = (X^TX)^{-1}X^Ty$ can be obtained using QR decomposition on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.

I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.

Using $LU$ we have:

$\hat\beta = (X^TX)^{-1}X^Ty=(LU)^{-1}X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^{-1}b=a$ at cost $\sum_1^{k=n} (2k-1)$ and finally $U^{-1}a$ at the same cost of $\sum_1^{k=n} (2k-1)$.

I didn't count the cost of computing $L^{-1}$ and $U^{-1}$.

Using $QR$ we have: $\hat\beta = (X^TX)^{-1}X^Ty=((QR)^TQR)^{-1}R^TQ^Ty=R^{-1}Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^{-1}a$ with cost $\sum_1^{k=n} (2k-1)$.

Comparing the decompositions: It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.

$SVD$ decomposition: Is there any advantage to use SVD decomposition?


Solution 1:

Matrix Computations

Golub and Van Loan, 3e

$\S$ 5.7.1, p. 270

Comparing flop counts for operations on $n\times n$ matrices:

$\frac{2}{3}n^{3}\ $ $\qquad$ Gaussian elimination

$\frac{4}{3}n^{3}\ $ $\qquad$ Householder orthogonalization

$2n^{3}$ $\qquad \ \ $ Modified Gram-Schmidt

$\frac{8}{3}n^{3}\ $ $\qquad$ Bidiagonalization

$12n^{3}$ $\qquad$ Singular Value Decomposition

Three reasons to choose orthogonalization to solve square systems:

  1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $\mathbf{Q}\mathbf{R}$ factorization is comparable in efficiency.

  2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.

  3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $\mathbf{Q}\mathbf{R}$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.

Solution 2:

Your reasoning at the top is really odd. The LU decomposition is twice as fast as the standard QR decomposition and it will solve most systems. There are however pathologically ill-conditioned systems and non-square systems. That is where it will use the QR or SVD. The main reason for the SVD is it allows you to be selective about your condition number.

There are many other decompositions. The Cholesky decomposition is twice as fast as the LU decomposition but only for positive definite Hermitian matrices. All of this neglects the sparsity of the matrix as well.