To Interpret Solving Systems of Linear Equations Geometrically in Terms of Linear Algebra
I never really understood basic Gaussian elimination & solving systems of equations once I learned some actual linear algebra. I thought this was due to me missing some fundamental aspect of the subject that some book would eventually illuminate for me or that things would just click but no they haven't & I can't stand being told something along the lines of Kaplansky quote "we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury" as a rationale for the apparent disconnect between the theory & application of linear algebra when I view things as I'll describe below.
Lets say I have this square system:
$ax + by = e$
$cx + dy = f$
I think there are four ways we can geometrically understand this picture, & I have questions about all of them (note that nothing will be said about bigger or non-square systems in this post).
$01:$ VECTORS & LINEAR MAPS
If I want to understand this exclusively in terms of vectors & linear maps I can write this system as a linear combination:
$x(a,c) + y(b,d) = (e,f)$
$xT(\hat{e_{1}}) + yT(\hat{e_{2}}) = (e,f)$
$T(x\hat{e_{1}} + y\hat{e_{2}}) = (e,f)$
$T(\vec{v}) = \vec{z}$
Now we can see that solving this system of linear equations is equivalent to determining which vector in the domain of $T$ is mapped to the vector $(e,f)$. Furthermore, using the fact that a linear map on a FDVS is completely determined by it's action on a basis, if we arrange things such that T acts on the standard basis then we can use linearity to determine the scalar multiples x & y.
I think that's the general gist of what's going on (this is all correct so far, right?), & from a distance this is very geometric & conceptually intuitive. In the best case scenario (unique solution to the system) this is the image I think most people have.
The thing I don't like about this perspective is how divorced it is from all computations that I know of, it basically has nothing to do with Gaussian or Gauss-Jordan elimination as far as I can tell.
My first question is whether or not you can use this interpretation, i.e. linear maps, in a computational sense because it seems to me you have to revert to another interpretation I'll outline below & I'm wondering whether the concepts are actually so apparently divorced or whether I'm missing something, maybe I just don't see how all of this is actually related to basic linear algebra. Also it just seems strange to me to whip out new vectors that, while admittedly contain something from both equations, geometrically has no obvious connection with the lines.
02: NORMAL VECTORS
This interpretation uses the fact that the vector $(a,b)$ is the normal vector to $ax + by = e$ (i.e. $(a,b)\cdot(x - x_0,y - y_0) = 0$ such that $ax_0 + by_0 = e$) & is basically a geometric interpretation of (every step of) both Gaussian & Gauss-Jordan elimination, giving some soul & feeling to the algebraic computations. Here you're using the second most obvious vectors associated with the lines (the normal, with the first most obvious vector being that one parallel to the line). Thus when you have
$ax + by = e$
$cx + dy = f$
& you add a scalar multiple of one to the other you get
$(a + \lambda c)x + (b + \lambda d)y = e + \lambda f$,
you can interpret this as nothing other than adding normal vectors to end up with a new 'normal vector' $(a + \lambda c,b + \lambda d)$ (what it is 'normal' to I don't know but I think it just a convenient vector we use as a means to eliminate coefficients, as done next) & end up with:
$(a + \lambda c,b + \lambda d)\cdot(x - x_0,y - y_0) = 0$ s.t.
$(a + \lambda c)(x - x_0) + (b + \lambda d)(y - y_0) = 0$
$a(x - x_0) + \lambda c(x - x_0) + b(y - y_0) + \lambda d(y - y_0) = 0$
$ax + \lambda cx + by + \lambda dy - ax_0 - \lambda cx_0 - by_0 - \lambda dy_0 = 0$
$(a + \lambda c)x + (b + \lambda d)y = ax_0 + \lambda cx_0 + by_0 + \lambda dy_0 $
Thus as long as $(a,b)$ & $(c,d)$ are not linearly dependent you can't choose $\lambda$ such that the above becomes $(0,0)\cdot(x - x_0,y - y_0) = 0$. Now the standard route is to choose $\lambda$ such that you eliminate one of the variables & solve for the other, say $\lambda = - \frac{a}{c}$, gives
$(a + \lambda c,b + \lambda d)\cdot(x - x_0,y - y_0) = 0$
$(a - \frac{a}{c} c,b - \frac{a}{c} d)\cdot(x - x_0,y - y_0) = 0$
$(0,b - \frac{ad}{c})\cdot(x - x_0,y - y_0) = 0$
$(b - \frac{ad}{c})(y - y_0) = 0$
$bc(y - y_0) - ad(y - y_0) = 0$
$bcy - bcy_0 - ady + ady_0 = 0$
$(ad - bc)y_0 = (ad - bc)y$
$y_0 = y$
which can also be done using:
$(a + \lambda c)x + (b + \lambda d)y = ax_0 + \lambda cx_0 + by_0 + \lambda dy_0 $
since you get
$(a - \frac{a}{c} c)x + (b - \frac{a}{c}d)y = ax_0 - \frac{a}{c}(cx_0) + by_0 - \frac{a}{c} dy_0 $
$(b - \frac{a}{c}d)y = (b - \frac{a}{c} d)y_0 $
$y = y_0 $
Similarly for finding $x = x_0$, however we want to understand this geometrically.
My second question is as to whether it right to interpret the above as saying that we're going to take $(x_0,y_0)$ as the hypothetical point of intersection of the two lines & in the situation that no $\lambda$ can be chosen such that the dot product will contain a zero vector (i.e. if we can be sure the normal vectors are linearly independent) we know it uniquely exists & from then on we are doing nothing other than choosing $\lambda$ such that, say when we're solving for $y = y_0 $, the vector $(a + \lambda c,b + \lambda d)$ points in the y axis direction, i.e. it's a vertical vector in the cartesian plane, of the form $(0,y_0)$, i.e. pointing to the y component of the intersection of the two lines? Similarly for finding the $x_0$ term, we just use vector addition to eliminate a coefficient then find $(x_0,0)$, then through finding both $(x_0,0)$ & $(0,y_0)$ we simultaneously find $(x_0,y_0)$. Unless I'm deluded I'm pretty sure all of the above is a geometric way to understand every step of those furious computations with matrices so I don't see how this can be wrong...
My third question is to how any of this discussion relates to linear maps? It seems to me that interpreting a system of linear equations in terms of normal vectors is far superior to interpreting them in terms of linear maps, at least in the square $n x n$ case. Am I missing something?
03: DETERMINANTS & LINEAR MAPS:
Let $\Psi$ be an alternating bilinear form such that $\Psi(e_1,e_2) = 1$. For an operator $T$ we note the number $\lambda$ such that $\Psi(T(e_1),T(e_2)) = \lambda\Psi(e_1,e_2)$ is known as the determinant, i.e. $\Psi(T(e_1),T(e_2)) = det(T)\Psi(e_1,e_2)$. Again this way of looking at things is very intuitive from a distance, the determinant of an operator is nothing but the number such that the area between $T(e_1)$ & $T(e_2)$, i.e. $\Psi(T(e_1),T(e_2))$, is just a multiple of the area between $e_1$ & $e_2$, i.e.$\Psi(e_1,e_2)$ (disregarding signs). In fact we have no problem in more generally writing $\Psi(T(u),T(v)) = det(T)\Psi(u,v)$ for arbitrary vectors $u$ & $v$.
Note that $\Psi$ has nothing to do with normal vectors here, it's exploiting the properties of the first way of looking at this system (in terms of matrices we're dealing with the determinant as a linear function of the columns basically). The reason I bring this topic up here is to find out about how to relate these concepts to the geometry of the situation. Again we are introducing seemingly arbitrary vectors $T(e_1)$ & $T(e_2)$ that don't relate to the geometry of the lines (though of course the vectors contain algebraic information).
With that said my fourth question comes from solution's determined via Cramer's rule. If you use this notation, $\Psi(T(e_1),T(e_2)) = det(T)\Psi(e_1,e_2)$, you see $\Psi(\vec{z},T(e_2)) = \Psi(xT(e_1) + yT(e_2),e_2)$ implies $x = \frac{\Psi(\vec{z},T(e_2))}{\Psi(T(e_1),T(e_2))}$. This term simply must have some fascinating interpretation... I would love to know what it means to say that the $x$ component of the point of intersection of two lines is the ratio of
- the area between the vectors whose components are the solutions to both of the equations (I can't see a nice way to talk about or interpret this) & the vector $T(e_1)$ (whatever this vector is supposed to be interpreted as)
- to the area contained within $T(e_1)$ & $T(e_2)$.
My fifth question is almost the same as the above except it modifies the interpretaton of the last sentence "to the area contained within $T(e_1)$ & $T(e_2)$". If we exploit the fact that for matrices: $det(T) = det(T^t)$ we can interpret the determinant in a whole new manner intimately related to the geometry of the lines, we can now interpret the determinant as containing the (signed) area between the normal vectors to the lines (which immediately gives meaning to the situations of either a zero or non-zero determinant). To restate the question I would love to know what it means to say that the $x$ component of the point of intersection of two lines is the ratio of
- the area between the vectors whose components are the solutions to both of the equations & the vector $T(e_1)$ (whatever this vector is supposed to be interpreted as)
- to the area contained within the normal vectors to the two lines.
My sixth question is whether I'm right to make all these distinctions. I don't know whether I should be going so far as to even delineate between two separate interpretations of the denominator in the solution to cramer's rule & asking for two different interpretations but it really seems like you have to be able to think about this in two different ways, one extremely geometric on every level (normal vectors), the other geometric only at the start. I am just not sure, I think you just have no intuitive geometric interpretation in terms of linear maps, you have to use these almost arbitrary vectors $T(e_1)$ divorced from the geometry of the lines if you think in terms of linear maps whereas when you do it in terms of normal vectors you get something nice.
04 LINEAR FUNCTIONALS & LINEAR MAPS
My seventh & final question is about the relationship of linear functionals to solving systems of linear equations. Given the system:
$ax + by = e$
$cx + dy = f$
i.e. $xT(\hat{e_{1}}) + yT(\hat{e_{2}}) = (e,f)$`
we ask how linear functionals interact with this setup. By introducing $\psi_1(xe_1 + ye_2) = e$ & $\psi_2(xe_1 + ye_2) = f$ we see
$\psi_1(xe_1 + ye_2) = x\psi_1(e_1) + y\psi_1(e_2) = ax + by = e$
$\psi_2(xe_1 + ye_2) = x\psi_2(e_1) + y\psi_2(e_2) = cx + dy = f$
I really don't know how to interpret this or fit it into the general scheme of things. It seems to be saying that a linear functional maps the solution vector to a line, & that the action of a linear functional on a basis results in coefficients of the normal vectors (i.e. in some way you're mapping the solution of the system to the normal vectors) but I don't know what you're supposed to do with this & would appreciate any help on how to interpret this in light of everything I've asked.
I really appreciate any help with this, I know it's a long post but the questions are, in my mind, all tied together so I sincerely appreciate any help.
Solution 1:
Your first question is hard to answer without going through the others, so I'll come back to it.
Your second question:
The "normal" vectors are vectors perpendicular to the lines in question. These are often used because, in an N-dimensional vector space, an equation of the form $\sum_i^N c_i x_i = d$ describes an N-1 dimensional object--in 3d, it describes a plane; in 4d, it describes a 3d hypersurface, and so on. Normal vectors--the vectors orthogonal to these objects--generalize well to any case in arbitrary dimensions.
In essence, when you add the equations of two lines (talking about the 2d case), you get another line, which is described by the new normal vector.
Otherwise, I think your understanding is correct.
For your third question: As we've seen, it's not possible to characterize an arbitrary line just from its normal vector. Rather, you must have some third degree of freedom to characterize a line's offset from the origin. This is the justification for homogeneous coordinates and projective geometry, which I'll touch on in just a moment.
For your fourth question: You use an "alternating bilinear form" $\Psi$. This is a natural time to talk about exterior algebra and wedge products. Define the wedge product $e_1 \wedge e_2 = - e_2 \wedge e_1$ for two orthogonal vectors. A linear operator $\underline T$ obeys $\underline T(e_1) \wedge \underline T(e_2) = \lambda e_1 \wedge e_2$ in two dimensions (the left-hand side is also taken as the definition of $\underline T(e_1 \wedge e_2)$ also, which generalizes to higher dimensions). Anyway, the notation is different, but we're talking about the same math.
One way to interpret Cramer's rule is in the projective geometry I spoke of. As I said, lines with an aribtrary offset from the origin require 3 degrees of freedom to describe: a point that the line goes through (two degrees) and a direction (one degree). The natural way to work with this is in a three dimensional space. In $ax + by = c$, let $c$ go along the third axis, and the vector $a e_1 + b e_2 + c e_3$ goes normal to a plane through the origin. Take another such vector, $u e_1 + v e_2 + w e_3$, and take the cross product.
$$(a e_1 + b e_2 + c e_3) \times (u e_1 + v e_2 + w e_3) = (av - bu) e_3 + (bw - cv) e_1 + (cu - av) e_2$$
Note that this is a homogeneous representation, and any non-unit factor of $e_3$ can be rescaled. Dividing through by the coefficient of $e_3$ generates exactly the terms of cramer's rule, and the interpretation geometrically is simple: we found the common line between the two planes (planes which, in this space, represent lines on the original space), whose direction must of course be perpendicular to the planes' normal vectors. Where the common line in the projective space intersects the projective plane is the ordinary point of intersection.
There is a way to extend this to arbitrary dimensions (ones in which the cross product doesn't exist), but you're on the right track, talking about alternating bilinear forms.
I'm afraid I'm not really able to follow from this point (or even to answer your original question), but I think I can probe at your difficulty as follows: the idea of a system of linear equations being a linear map is, indeed, quite arbitrary (you can choose the ordering of the equations as they correspond to components however you like and add and subtract equations at will). It doesn't have a neat geometric interpretation, as far as I can see. Homogeneous coordinates and projective geometry, on the other hand, give very neat and clean geometric interpretations, something you can intuitively understand (or at least that I can).
I won't call this a complete answer, but perhaps delving into projective geometry and homogeneous coordinates will give you further insights into the problem of finding the intersections between lines (or the common lines between planes, etc.). It's the only method I use anymore. In particular, I highly recommend the geometric (clifford) algebra approach to this stuff. Doran and Lasenby or Dorst, Fontijne, and Mann give excellent descriptions of projective geometry (and conformal geometry) using that formalism.