Intuition for Formal Definition of Linear Independence

I learned about this a long time ago but it never really clicked, which led me to these questions:

  1. How the formal definition (at the bottom) works. I have a rough intuition: linear independence is where the variables are independent and don't affect each other. But I don't follow the formal definition. I would like to have a deep understanding of the formal definition based on these linear combination equations. I'm not sure how a linear combination constructed in a certain way can tell you the variables are independent or not.
  2. Why set the linear combination equations to $\vec{0}$. I don't see how setting to zero helps determine independence or not.
  3. Why choose $a_i$ to be non-zero in one case. It seems arbitrary.

From Wikipedia:

A subset $S=\{{\vec {v}}_{1},{\vec {v}}_{2},\dots ,{\vec {v}}_{n}\}$ of a vector space $V$ is linearly dependent if there exist a finite number of distinct vectors ${\vec {v}}_{1},{\vec {v}}_{2},\dots ,{\vec {v}}_{k}$ in $S$ and scalars $a_{1},a_{2},\dots ,a_{k}$, not all zero, such that

$$a_{1}{\vec {v}}_{1}+a_{2}{\vec {v}}_{2}+\cdots +a_{k}{\vec {v}}_{k}={\vec {0}}$$

where ${\vec {0}}$ denotes the zero vector.

The vectors in a set $T=\{{\vec {v}}_{1},{\vec {v}}_{2},\dots ,{\vec {v}}_{n}\}$ are linearly independent if the equation

$$a_{1}{\vec {v}}_{1}+a_{2}{\vec {v}}_{2}+\cdots +a_{n}{\vec {v}}_{n}={\vec {0}}$$

can only be satisfied by $a_{i}=0$ for $i=1,\dots ,n$.

So my understanding is, there are two subsets $S$ and $T$ of $V$. In one of them, the coefficients are not all zero, in the other they are all zero. In one case they are linearly dependent, in the other not. I don't understand why though; that's as much as I understand. Not sure why the equations were constructed like this in the first place.


Imagine you have a collection of arrows pointing in various directions. If they're linearly dependent, then you can stretch, shrink, and reverse (but not rotate) them in such a way that if you lay them head-to-tail then they form a closed loop. For example, if you have three arrows that happen to all lie in the same plane (linearly dependent), then you can form a triangle out of them, but you can't if one of them sticks out of the plane formed by the other two (linearly independent).


Before we grapple with linear independence, it might be best to figure out what linear dependence means first.

When I think of the phrase "linear dependence" with regards to a set of vectors, what comes to mind for me is that, in some sense, one of those vectors "depends" on the other vectors in the set. The mathematical formalization for this dependence is:

A set of nonzero vectors $\{\mathbf{v}_k \}_{k=1}^n$ in a vector space $V$ over a field $F$ is called linearly dependent when there exists$^\dagger$ a $j$ so that one can write $\displaystyle \mathbf{v}_j = \sum_{k \neq j} c_k \mathbf{v}_k$, where $c_k \in F$.

So we say $\mathbf{v}_j$ "depends" on the other vectors: for any $d \in \mathbb{R}$, one can arrive at the point $d\mathbf{v}_j$ simply by travelling some distance in each of the other directions$^\ddagger$ $\mathbf{v}_{k \neq j}$. In other words, the span of the entire set of vectors is the same as the span of that same set of vectors but with $\mathbf{v}_j$ excluded. Note that, because $\mathbf{v}_j$ is nonzero, we must have $c_k \neq 0$ for at least one $k$.

Rewriting the above as $\mathbf{v}_j - \displaystyle \sum_{k \neq j} c_k \mathbf{v}_k = 0$, we can now deduce that a set of $n$ vectors is linearly dependent $\iff$ we can find a set of constants $\{c_k\}_{k=1}^n$, not all zero, so that $\displaystyle \sum_{k=1}^n c_k \mathbf{v}_k = 0$. This statement is equivalent to our definition for linear independence:

A set of vectors $\{\mathbf{v}_k \}$ is called linearly independent $\iff$ we cannot find a set of constants $\{c_k\}_{k=1}^n$, not all zero, so that $\displaystyle \sum_{k=1}^n c_k \mathbf{v}_k = 0$. This is to say, $c_k = 0$ for all $k$ is the only solution to this equation.


$^\dagger$ This $j$ is never unique.


$^\ddagger$ To give a concrete example, let $\displaystyle \mathbf{v}_1 = \left[1 \atop 0 \right], \mathbf{v}_2 = \left[0 \atop 1\right]$, and $\displaystyle \mathbf{v}_3 = \left[1 \atop 1\right]$ in $\mathbb{R}^2$. See that $\mathbf{v}_3$ depends on $\mathbf{v}_1$ and $\mathbf{v}_2$ since you can get to the point $d \mathbf{v}_3$ for any $d \in \mathbb{R}$ by first travelling $d$ units in the $\mathbf{v}_1$ direction and then $d$ units in the $\mathbf{v}_2$ direction.