Why does $A^TAx = A^Tb$ have infinitely many solution algebraically when $A$ has dependent columns?
Solution 1:
Let $A$ be an $n \times m$ matrix.
Consider $$T_{A^T} : \mathbb R^n \to \mathbb R^m , T_{A^T}=A^Tx \\ T_{A} : \mathbb R^m \to \mathbb R^n , T_{A}=Ax \\ T_{A^TA} : \mathbb R^m \to \mathbb R^m , T_{A^TA}=A^TAx $$
Since $T_{A^TA}=T_{A^T}\circ T_A$ we have $$C(A^TA)=range(T_{A^TA}) \subset range(T_{A^T})= C(A^T)$$
If we can show that these two spaces have the same dimension we are done.
But this comes for free from the rank nullity theorem and the fact that $$ker(T_{A^TA})=\ker(T_A)$$
Indeed, the $\supset$ inclusion is obvious. For $\subset$ let $x \in ker(T_{A^TA})$, then $$A^TAx=0 \Rightarrow x^TA^TAx=0 \Rightarrow (Ax)^T(Ax)=0 \Rightarrow (Ax)\cdot(Ax)=0 \Rightarrow \| Ax\|^2=0 \Rightarrow Ax=0$$
Solution 2:
Existence
First and foremost, we must have existence before we can talk about uniqueness. To have existence, we require that the data vector $b$ is not in the $\color{red}{null}$ space: $$ b\notin\color{red}{\mathcal{N} \left( \mathbf{A} \right)} $$
Uniqueness
When the matrix $\mathbf{A}$ has dependent columns, the $\color{red}{null}$ space $\color{red}{\mathcal{N} \left( \mathbf{A} \right)}$ is nontrivial. Let $\color{red}{z}$ be any vector in the $\color{red}{null}$ space, and let $x_{LS}$ be a known least squares minimizer.
$$ \mathbf{A} \left( x_{LS} + \color{red}{z} \right) = \mathbf{A} x_{LS} + \mathbf{A} \color{red}{z} = \mathbf{A} x_{LS} + \mathbf{0} = \mathbf{A} x_{LS} $$
Under the action of $\mathbf{A}$, the solutions $\left( x_{LS} + \color{red}{z} \right)$ and $x_{LS}$ are equivalent.
Geometrically, the issue is shown below. The general least squares solution is the affine space represented by the red, dashed line.
If the $\color{red}{null}$ space is trivial, $\color{red}{\mathcal{N} \left( \mathbf{A} \right)} = \mathbf{0}$, the least squares solution is the point $\color{blue}{\mathbf{A}^{+}b}$ in the $\color{blue}{range}$ space $\color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)}$.
If the $\color{red}{null}$ space is not trivial, $\color{red}{\mathcal{N} \left( \mathbf{A} \right)} \ne \mathbf{0}$, the least squares solution is the affine space going through the point $\color{blue}{\mathbf{A}^{+}b}$ in $\color{blue}{range}$ space $\color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)}$ and extending through the $\color{red}{null}$ space.
Least squares solution
Given a matrix $\mathbf{A}\in\mathbb{C}^{m\times n}_{\rho}$, and a data vector $b\in\mathbb{C}^{m}$ such that
$$
b\notin\color{red}{\mathcal{N} \left( \mathbf{A} \right)}
$$
The least squares solution is defined as
$$
x_{LS} = \left\{
x\in\mathbb{C}^{n} \colon
\lVert
\mathbf{A} x - b
\rVert_{2}^{2}
\text{ is minimized}
\right\}
$$
The least squares solution is computed using
$$
x_{LS} =
\color{blue}{\mathbf{A}^{+} b} +
\color{red}{
\left(
\mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A}
\right) y}, \quad y \in \mathbb{C}^{n}
$$
In this form, it's clear that having a unique solution demands
$$
\mathbf{A}^{+} \mathbf{A} = \mathbf{I}_{n}
$$
which happens when $\color{red}{\mathcal{N} \left( \mathbf{A} \right)} = \mathbf{0}$. More details are in the subsequent links.
Explore Stack Exchange:
Existence and uniqueness of least squares solutions: Query about the Moore Penrose pseudoinverse method
Derivation of the SVD solution and conditions on existence and uniqueness: Singular value decomposition proof
Subspace decomposition for least squares: Singular Value Decomposition
How the two null spaces affect the least squares solution: Pseudo-inverse of a matrix that is neither fat nor tall?, What forms does the Moore-Penrose inverse take under systems with full rank, full column rank, and full row rank?
How full column rank changes the inverse: How to find the singular value decomposition of $A^{T}A$ & $\left( A^{T}A \right)^{-1}$
How null spaces affect the pseudoinverse: generalized inverse of a matrix and convergence for singular matrix, When pseudo inverse and general inverse of a invertible square matrix will be equal or not equal?