Solution 1:

Here, the data points don't lie around a line passing through origin. Had it been the case, the LS fit solution would be $Ax=0$, and would have been given by the first (right) singular vector (i.e, the dominant eigenvector of the matrix $A^TA$), which would be the solution to the optimization problem for maximizing the projection $\underset{|v|=1}{max}|Av|=0$.

Still the first right singular vector is the dominant eigenvector of $A^TA$: $[0.2557118, 0.9667531]$, as you can find with R

> eigen(t(A)%*%A)
eigen() decomposition
$values
[1] 765.599 389.401

$vectors
          [,1]       [,2]
[1,] 0.2557118 -0.9667531
[2,] 0.9667531  0.2557118

which can be directly found with SVD:

> svd(A)$v
           [,1]       [,2]
[1,] -0.2557118  0.9667531
[2,] -0.9667531 -0.2557118

You get $A^TA = \begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}$ and the characteristic polynomial $\lambda^2-1155\lambda+298125=0$, solving you get the largest eigenvalue $\lambda=765.5991$ and Solving the linear system $\begin{bmatrix}414 & 93\\ 93 & 741 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2\end{bmatrix}=765.5991\begin{bmatrix}x_1 \\ x_2\end{bmatrix}$, $\implies 93x_2=351.5991x_1$ and $93x_1=24.5991x_2 \implies \frac{x_1}{x_2}\approx 0.2645$, s.t. the corresponding eigenvector is $\begin{bmatrix}0.2645\\ 1\end{bmatrix}$, normalizing (i.e, dividing by $\sqrt{0.2645^2+1^2}=1.034389$), we get the dominant unit eigenvector as $\begin{bmatrix}\frac{0.2645}{1.034389}\\ \frac{1}{1.034389}\end{bmatrix}=\begin{bmatrix}0.25571\\ 0.96675\end{bmatrix}$, which agrees with computation using the numeric methods with R.

Let's for instance consider a different set of points $(\frac{7}{10},\frac{7}{10}), (7,9), (2,\frac{9}{2}$), s.t., the matrix $A=\begin{bmatrix}7/10 & 7/10\\ 7&9 \\ 2&9/2 \end{bmatrix}$. Now if we want the best fit line through origin (without having an intercept) then the corresponding vector will lie in the nullspace of $A$, i.e., will be a solution of $Ax=0$, the least square solution can be approximately found with SVD of $A$, as shown below:

svd(A)$v[,1]
# [1] -0.5849033 -0.8111030
v <- svd(A)$v
slope <- v[2] / v[1]
# [1] 1.38673

The following figure shows the best-fit line obtained with SVD:

enter image description here

This agrees with the linear regression least-square fit without intercept:

lm(y~X+0)$coeff
#       X 
#1.355207 

Now, let's say we have the points $(17,4), (-2,26), (11,7)$ instead, then in order to find the best fit line, we need to minimize $||Ax-b||_2^2$, which boils down to solving $Ax=b$, where $b \neq 0$. Here, we have $A=\begin{bmatrix}1 & 17 \\ 1 & -2 \\ 1 & 11 \end{bmatrix}$ and $b=\begin{bmatrix} 4 \\ 26 \\ 7 \end{bmatrix}$ and the least square solution is given by the normal equation (the psuedo inverse) $\hat{\beta}=(A^TA)^{-1}A^Ty$. Now, we can use SVD here too, with SVD of $A=U\Sigma V^T$, we have $\hat{\beta}=V\Sigma^{-1}U^Ty$, as shown below.

 s <- svd(A)
 U <- s$u
 V <- s$v
 V
 #            [,1]        [,2]
 #  [1,] 0.06288448  0.99802081
 #  [2,] 0.99802081 -0.06288448
 S <- s$d
 V%*%solve(diag(S), t(U)%*%y) # LS solution with SVD
 #         [,1]
 #[1,] 22.791519
 #[2,] -1.206714

It matches with the LS solution obtained using normal equation:

solve(t(A)%*%A, t(A)%*%y) # LS solution with normal equation
          [,1]
[1,] 22.791519
[2,] -1.206714

enter image description here