What is an intuitive explanation to why elimination retains solution to the original system of equations?
Solution 1:
I will likely be proven wrong by another answer, but I don't think that you can get a geometrical explanation. Algebraically, the situation is very clear: you have a system $Ax=b$, and when you perform row reduction, you are multiplying on the left by an invertible (elementary) matrix. So the system becomes $EAx=Eb$, with the same solution. Multiplying by an invertible matrix does not have an easy geometrical interpretation, as far as I can tell.
Solution 2:
There is only one elimination rule: if I perform an operation on two copies of the same number, both copies change in the same way.
If $x-2y=1$ then $3x-6y=3$. I have changed nothing. I have merely noticed that if I multiply $1$ by $3$ I get $3$.
If I now subtract $3x-6y=3$ from $3x+2y=11$ I am just subtracting $3$ from $11$ which gives me $8y=8$. If I now divide $8$ by $8$ I get $y=1$.
In all cases I am just doing the same thing to the same number written in two different ways.
Solution 3:
Suppose someone gives you a pair of distinct lines in the plane, neither horizontal nor vertical, and intersecting at a point $O$. To find the Cartesian coordinates of $O$, you could rotate one line until it was horizontal, then rotate the other line until it was vertical. You could then read off the coordinates by checking where the rotated lines hit the coordinate axes.
Rotating a line to be horizontal means you've eliminated $x$ from its equation; rotating a line to be vertical means you've eliminated $y$. Perhaps the surprising thing is that elementary row operations effectively rotate lines.
To make this algebra-to-geometry dictionary more precise, write the original lines as \begin{align*} a_{1}x + b_{1}y + c_{1} &= 0, \\ a_{2}x + b_{2}y + c_{2} &= 0. \\ \end{align*} View each left-hand side as the value of a function of two variables whose graph is a plane.
Multiplying an equation by a non-zero scalar "rotates" the corresponding plane in space about its zero set, but doesn't change the zero set.
Fixing one equation and subtracting a non-zero scalar multiple of the other amounts to intersecting planes, then projecting the line of intersection to the $(x, y)$-plane; as the scalar multiple changes, (the shadow of) the line of intersection rotates about $O$. If you tilt the planes just right, the projected line of intersection is parallel to a coordinate axis.
Solution 4:
So even though, the system seems to have changed a lot, in reality, it didn't change that much since they still intersect at the same point.
My question is, what is the intuition to why manipulating system of equations in this way and combining them retains the original solution.
The reason that the solution space stays the same is algebraic in nature:
As shown by your images, each row defines an affine hyperplane (an affine plane of dimension $n-1$): $$ a_i^\top x = \beta_i $$ where the vector $a_i$ is a normal vector to the hyperplane.
The elementary scalings of a row does not change the points $x$ of the hyperplane, only (the length) of its normal vector, which stays normal, and the inhomogenity to compensate: \begin{align} a_i^\top x = \beta_i \iff \\ s a_i^\top x = s \beta \iff \\ {a'}_i^\top x = {\beta'}_i \end{align} Adding a multiple of a hyperplane to another gives $$ \beta_i = \alpha_i^\top x \to \quad (*) \\ \beta_i + s \beta_j = \alpha_i^\top x + s \alpha_j^\top x = (\alpha_i + s \alpha_j)^\top x \iff \\ (\beta')_i = (\alpha')_j^\top x \quad (**) $$ Stand alone this can be a new hyperplane, with a new set of points $x$. However for a $x$ of the common intersection, this means that still $\alpha_j^\top x = \beta_j$ holds, so we can again revert to the equation $(*)$ from $(**)$. So $$ \alpha_i^\top x = \beta_i \wedge \alpha_j^\top x = \beta_j $$ has the same solutions as $$ \alpha_i^\top x + s \alpha_j^\top =\beta_i + s \beta_j \wedge \alpha_j^\top x = \beta_j $$ The geometric consequence is that the hyperplanes $(*)$ and $(**)$ have the points $x$ in common, for which $\alpha_j^\top x = \beta_j$, which are at least a $n-2$ dimensional affine subspace. For $n=2$ this means staying the same or rotating around the intersection point. For $n=3$ this means staying the same or rotating around the intersection line.
OK, so we can do affine rotations without changing the solution space. However we do not do this at random, but have a strategy which allows to read the solutions from the parameters $\alpha$ and $\beta$:
The goal of elimination is to get a row echelon form. Row per row more dependencies of variables are removed.
(Large Version)
In your image the $x$-dependency of your second line has been removed, it just depends on $y$, which because it is the last variable, turns out constant. It is an affine rotation.
As we now have the $y$-dependency in the second row, we could remove it from the first row, keeping only the $x$-dependency there: $$ \left[ \begin{array}{rr|r} 1 & -2 & 1 \\ 0 & 8 & 8 \end{array} \right] \to \left[ \begin{array}{rr|r} 1 & -2 & 1 \\ 0 & 1 & 1 \end{array} \right] \to \left[ \begin{array}{rr|r} 1 & 0 & 3 \\ 0 & 1 & 1 \end{array} \right] $$ The matrix is now in reduced row echelon form.
(Large version)
We have aligned each line parallel to one of the coordinate axes. In higher dimensions similiar alignment happens. We can directly read fixed coordinates of the solutions from this form.
Further we can make statements about the dimension of the solution space:
If the row echelon form has $k$ empty rows, it is clear that the intersection of $n-k$ hyperplanes turns out not to be zero dimensional (a point, a unique solution) but is some at best only $k$ dimensional affine subspace, if all hyperplanes intersect not identical. If they do not have a common intersection, there is no solution.