Definition of Affine Independence in Brondsted's Convex Polytopes?

At one point in the book (An Introduction to Convex Polytopes, by Arne Brondsted) a definition of affine independence is given as follows,

An n-family $(x_{1},...,x_{n})$ of points from $\mathbb{R}^d$ is said to be affinely independent if a linear combination $\lambda_{1} x_{1} + ... + \lambda_{n} x_{n}$ with $\lambda_{1} + ... + \lambda_{n} = 0$ can only have the value zero vector when $\lambda_{1}=...=\lambda_{n}=0$.

It is my hunch that affine independence is analogous to linear independence in that, a set of vectors is (affinely/linearly) independent if none of the vectors is an (affine/linear) combination of the others. If this is the case, then what does the condition $\lambda_{1} +... +\lambda_{n} = 0$ have to do with anything? Shouldn't it be that the linear combination $\lambda_{1}x_{1} +...+\lambda_{n}x_{n}$, with $\lambda_{1} + ... +\lambda_{n} =1$ can only have the value zero vector when $\lambda_{1}=...=\lambda_{n}=0$?


Solution 1:

No, there is no mistake there.

Consider the set of points:

$$ F = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 0 \right\} \ . $$

This set of points is a linear subspace of $ \mathbb{R}^d$, as you can easily check. If you solve for $\lambda_1$ the equation $\lambda_1 + \dots +\lambda_n = 0 $, you find that the vectors of $F$ can be written as

$$ x = -(\lambda_2 + \dots + \lambda_n)x_1 + \lambda_2x_2 + \dots + \lambda_n x_n = \lambda_2(x_2 - x_1) + \dots + \lambda_n (x_n - x_1) \ . $$

That is,

$$ F = \mathrm{span}\left\{ \overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n}\right\} \ . $$

(You could have done the same with any $x_i$ instead of $x_1$ too.)

Now, the following two statements are equivalent:

  1. Points $x_1, \dots , x_n$ are affinely independent.
  2. Vectors $\overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n} $ are linearly independent.

$\mathbf{(1) \Longrightarrow (2)}$. Let

$$ \mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = 0 $$

We have to show that this implies $\mu_2 = \dots = \mu_n = 0$. Indeed,

$$ 0 =\mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = -(\mu_2 + \dots + \mu_n)x_1 + \mu_2 x_2 + \dots \mu_n x_n \ . $$

In this expression, the sum of all coefficients is $0$. Since we are assuming $(1)$, this implies $\mu_2 = \dots = \mu_n = 0$.

$\mathbf{(2) \Longrightarrow (1)}$. Let

$$ \lambda_1 x_1 + \dots + \lambda_n x_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \ . $$

We have to show that this implies $\lambda_1 = \dots = \lambda_n = 0$. Indeed, solve the second equation for $\lambda_1$ again and you have

$$ 0 = \lambda_1 x_1 + \dots + \lambda_n x_n = - (\lambda_2 + \dots + \lambda_n) x_1 + \lambda_2 x_2 + \dots + \lambda_n x_n = \lambda_2 \overrightarrow{x_1x_2} + \dots + \lambda_n \overrightarrow{x_1x_n} \ . $$

Since we are assuming $(2)$, this implies $\lambda_2 = \dots = \lambda_n = 0$ and, since $\lambda_1 + \dots + \lambda_n = 0$, we have $\lambda_1 = 0$ too.

So far so good. Now, let's finish with another trivial remark about a geometrical interpretation of this linear subspace $F$ and that condition $\lambda_1 + \dots + \lambda_n = 0$. Consider the set of points

$$ V = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 1 \right\} \ . $$

This set is an affine subspace. Indeed,

$$ V = x_1 + F \ . $$

(You should check this equality and understand that you could put any $x_i$ in the place of $x_1$.)

You can say that $V$ is parallel to the subspace $F$: indeed, $V$ "is" just $F$ translated by $x_1$.

So what? What's so special about $V$? Well, on one hand, $V$ contains all the points $x_1 , \dots , x_n$ (exercise: check it!). On the other hand, it is the smallest affine subspace which contains them; in the sense that, if $W \subset \mathbb{R}^d$ is another affine subspace containing all $x_i$, then $V \subset W$.

Indeed, in general, if you have an affine subspace $W = p + G$ and two points in it $x, y \in W$, then $\overrightarrow{xy} \in G$. So, if $x_1, \dots , x_n \in W$, then $G$ must contain all $\overrightarrow{x_1x_i}$. Hence, $F \subset G$. So $V = x_1 + F \subset x_1 + G = W$.

Summing up: the condition that annoys you, $\lambda_1 + \dots + \lambda_n = 0$, makes the set $V$ to be the smallest affine subspace which contains all the points $x_1, \dots , x_n$.

EDIT. I forgot. Perhaps it would be a good exercise to redo everything we have seen here with some specific examples. For instance, take:

  1. $x_1 = (1,0), x_2 = (0,1)$ in $\mathbb{R}^2$.
  2. $x_1 = (1,0,0), x_2 = (0,1,0), x_3 = (0,0,1)$ in $\mathbb{R}^3$.
  3. $x_1 = (1,0), x_2 = (0,1), x_3 = (1/2, 1/2)$ in $\mathbb{R}^2$.

Solution 2:

I cannot resist the temptation to explain another reason (maybe "the" reason) for this condition that, apparently, seemed so mysterious: namely, $\lambda_1 + \dots + \lambda_n = 0$.

In my previous answer, we have already shown two things:

  1. This condition is the one that makes the points $x_1, \dots , x_n$ to be affinely independent if and only if the vectors $\overrightarrow{x_1x_2 }, \dots ,\overrightarrow{x_1x_n} $ are linearly independent.
  2. Also, this condition makes the set $V = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots + \lambda_n = 1 \right\}$ to be the smallest affine subspace containing all the points $x_i$.

But with just these two ideas in mind, it would be a legitimate doubt to ask ourselves: why do we need to define "affine independence" with that weird condition $\lambda_1 + \dots +\lambda_n = 0$ inside? Wouldn't be enough the common notion of "linear independence"? -Since, properly written, it's equivalent.

Indeed, one of the main reasons in mathematics to make a new definition is that it should save you time, not to lose it. So, let's see one natural next step in affine geometry where the notion of affine independence saves you time: it's the notion of barycentric coordinates.

So, assume your points $x_1, \dots , x_n$ are affinely independent. By definition, each point $x \in V$ can be written as

$$ x = \lambda_1 x_1 + \dots + \lambda_n x_n \ \qquad \text{with} \qquad \lambda_1 + \dots + \lambda_n = 1 \ . $$

But there is more. Because we have the following

Lemma. For each $x \in V$, these $(\lambda_1 , \dots , \lambda_n) \in \mathbb{R}^n$ are unique.

Proof. Assume we had

$$ x = \lambda_1 x_1 + \dots + \lambda_n x_n \ \qquad \text{with} \qquad \lambda_1 + \dots + \lambda_n = 1 $$

and also

$$ x = \lambda'_1 x_1 + \dots + \lambda'_n x_n \ \qquad \text{with} \qquad \lambda'_1 + \dots + \lambda'_n = 1 \ . $$

Then, substracting both pairs of equations, we would get

$$ 0 = (\lambda_1- \lambda'_1) x_1 + \dots + (\lambda_n -\lambda'_n) x_n \ \qquad \text{with} \qquad (\lambda_1-\lambda'_1) + \dots + (\lambda_n-\lambda'_n) = 0 \ . $$

But our points $x_1, \dots , x_n$ are affinely independent. Hence, $\lambda_1 = \lambda'_1, \dots , \lambda_n = \lambda'_n$.

So, these $(\lambda_1 , \dots , \lambda_n)$ determine, and are determinated by, the point $x$ much in the same way as the coordinates of a vector with respect to a basis determine, and are determinated by, the vector. Indeed, these $(\lambda_1 , \dots , \lambda_n)$ are called the barycentric coordinates of the point $x$ with respect to the affine frame $x_1, \dots , x_n$ of $V$.

Of course, you have as a particular case $V = \mathbb{R}^d$. Then you would need $d+1$ affinely independent points to get an affine frame there.

It's good to have some intuition of how these barycentric coordinates work. For this, I suggest you the following

Exercises.

  • Take $V = \mathbb{R}^1$. Then an affine frame is made out two different points, $p, q \in \mathbb{R}$ and every point $x \in \mathbb{R}$ can be written uniquely as

$$ x = \lambda_1 p + \lambda_2 q \qquad \text{with} \qquad \lambda_1 + \lambda_2 = 1 \ . $$

In this particular situation it's customary to write this simply as

$$ x = t p + (1-t) q, \qquad t \in \mathbb{R} \ . $$

So, the exercise is: assume for instance $p< q$ and place the points $x \in \mathbb{R}$ (on the left of $p$, between $p$ and $q$, on the right of $q$) according to its barycentric coordinates $(t, 1-t)$; i.e., the values of $t$.

  • Take $V = \mathbb{R}^2$. Then an affine frame is made out of three points $p, q, r \in \mathbb{R}^2$ not lying in the same straight line and every point $x \in \mathbb{R}^2$ can be written uniquely as

$$ x = \lambda_1 p + \lambda_2 q + \lambda_3 r \qquad \text{with} \qquad \lambda_1 + \lambda_2 +\lambda_3 = 1 \ . $$

And the exercise is: place the points $x \in \mathbb{R}^2$ (inside the triangle determined by $p,q, r$, outside, lying in one of its edges) according to the values of its barycentric coordinates $(\lambda_1, \lambda_2, \lambda_3)$.