what does "It chooses a basis close to the starting basis but has no guarantee of choosing the closest orthonormal basis."mean? [closed]
Solution 1:
I think what you're missing is that at some point in time there was a particular orthonormal basis $(\hat{\mathbf e}_1, \hat{\mathbf e}_2, \hat{\mathbf e}_3)$ that was used for some purpose, and you don't know what that basis was.
Instead what has been delivered to you is some basis $(\mathbf e_1', \mathbf e_2', \mathbf e_3').$ The vectors in this basis are close to the vectors of the original basis, for example $\mathbf e_1' \approx \hat{\mathbf e}_1,$ but they are not exactly the same as the original vectors. And because they are not exactly the same, their lengths are not as close to $1$ as they could be and the angles between them are not as close to right angles as they could be.
Now, the first method will deliver to you a new set of vectors $(\hat{\mathbf e}_1'', \hat{\mathbf e}_2'', \hat{\mathbf e}_3'')$ that is orthonormal (or at least a lot closer to orthonormal than the vectors $(\mathbf e_1', \mathbf e_2', \mathbf e_3')$ are; the numbers you store are stored with some limit on their precision and you'd have to be very lucky to have all the calculations come out to exactly $0$ or $1$). But $\hat{\mathbf e}_1'' = \mathbf e_1'$ exactly, so whatever error you already had in $\mathbf e_1'$ is now "baked into" the new basis, and it's very likely that you've made the errors in the other two vectors even worse than they were.
What you really want is the "original" basis vectors $(\hat{\mathbf e}_1, \hat{\mathbf e}_2, \hat{\mathbf e}_3),$ but since you don't know exactly what those vectors were, the best you can do is to make some kind of educated guess about their likely values. The claim is that SVD will give you a basis $(\hat{\mathbf e}_{1\mathrm{svd}}, \hat{\mathbf e}_{2\mathrm{svd}}, \hat{\mathbf e}_{3\mathrm{svd}})$ that is the "closest" guess you can make. Logically this does not ring true: the closest possible answer would be the original vectors $(\hat{\mathbf e}_1, \hat{\mathbf e}_2, \hat{\mathbf e}_3)$ themselves. And since we don't know exactly how the errors crept into the values of the basis vectors, we cannot possibly know exactly the best way to correct them.
But I suspect what's behind the statement you read is a combination of a few mathematical concepts that the explanation glosses over. First we need some definition of "close". Perhaps it's just the distances between the tips of the supposedly matching vectors, added together; perhaps it's the sum of the squares of those distances, for example $$ \lVert \hat{\mathbf e}_1'' - \hat{\mathbf e}_1 \rVert^2 +\lVert \hat{\mathbf e}_2'' - \hat{\mathbf e}_2 \rVert^2 +\lVert \hat{\mathbf e}_3'' - \hat{\mathbf e}_3 \rVert^2 $$ for the "distance" of $(\hat{\mathbf e}_1'', \hat{\mathbf e}_2'', \hat{\mathbf e}_3'')$ from $(\hat{\mathbf e}_1, \hat{\mathbf e}_2, \hat{\mathbf e}_3).$ We also need some assumptions about what the nature of the errors is likely to be, for example we might suppose each vector is perturbed randomly with some probability distribution independently of the other two vectors. Finally we need to choose some statistical property of the likely errors in the final "guess" that we want to optimize, for example maximum likelihood or minimum bias. Then the claim is that SVD optimizes the chosen statistical property of the chosen "closeness" measurement under the chosen assumptions of the error distribution. Which sounds like a lot of guesswork, so the real test would be whether the results of SVD are empirically better than the first method in actual practice over a lot of representative problems.
But having a reasonable mathematical model of a set of errors and a way to minimize an estimated error is better than just plugging away at a method that takes none of those things into account. In particular, it makes sense that we would prefer a method whose results are not so dependent on the fact that we list axes in the sequence $(x,y,z)$ rather than $(z,x,y).$