"An affine space is nothing more than a vector space whose origin we try to forget about, by adding translations to the linear maps."
Solution 1:
The definition of vector spaces is stated as it is because it allows to readily define linear combinations. Linear transformations $\mathbf v \mapsto T(\mathbf v)$ are the structure-preserving maps or homomorphisms of vector spaces, in the sense that they transform linear combinations into other linear combinations. If $T : V \to W$ is structure-preserving, then for all vectors $\mathbf v_1,\mathbf v_2 \in V$ and all scalars $\lambda,\mu$ $$T(\lambda\mathbf v_1+\mu\mathbf v_2) = \lambda T(\mathbf v_1) +\mu T(\mathbf v_2). $$ In other words, computing the linear combination of the vectors $\mathbf v_1,\mathbf v_2$ and transforming the result yields the same outcome as transforming the vectors first and then computing the linear combination of those outputs.
From this, you can quickly show that linear maps "fix the origin", i.e. they must send the zero vector in the input space to the zero vector in the target space. For this reason (among others) the vector $\mathbf 0$ is privileged among its colleagues in $V$. This means that linear transformations enjoy some kind of "rigidity" property: in particular, endomorphisms of a space $V$ (linear maps $V \to V$) can stretch, squeeze, permute a collection of non-zero vectors in $V$, but they can never displace the zero vector.
Affine spaces, on the other hand, are defined as a set $A$ together with a supporting vector space $V$ and an operation that links the two pieces as follows: for all $a,b \in A$, there exists some $\mathbf v \in V$ such that $$a-b = \mathbf v \in V. $$ Among other things, it is required in the definition that the difference $a-a$ is the special vector $\mathbf 0$. Equivalently, along with this notion of "vector difference" between affine points, there is an inverse notion of "sum" according to which one may write $b = a + \mathbf v$.
We make two observations:
- This definition of the sum $+ : A \times V \to A$ (the asymmetry of which is reminiscent of scalar multiplication in vector spaces) makes it so that we may visualize the application of a vector $\mathbf v \in V$ to an individual element $a \in A$ as "translating" $a$ to some $b = a + \mathbf v$ along the vector $\mathbf v$ – and as we expect, acting with $\mathbf 0$ on $a$ results in $a$ itself.
- The privileged vector $\mathbf 0 \in V$, when applied to $a \in A$, has the meaning of "no translation". Notice that there is no equally privileged element in the set $A$ itself: every element $a \in A$ has the same rights (to be translated by vectors) as any other element. This is what is meant by "forgetting the origin".
What are the structure-preserving maps of affine spaces (affine homomorphisms)? Just as the relevant feature of vector spaces is the ability to take linear combinations of elements, the relevant feature of affine spaces is the ability to translate its elements along vectors; therefore, we expect affine homomorphisms to preserve translations along vectors, besides preserving linear combinations of translation vectors. In other words, applying some vector $\mathbf v$ to $a \in A$ and subsequently computing a structure-preserving map $f : A \to B$ should yield the same result as first computing $f(a)$ itself and then translating that by a vector $\mathbf w$ taken from the vector space underlying $B$ – a vector which should be related to the original $\mathbf v$ through some given linear transformation $\mathbf w = T(\mathbf v)$. In symbols, $$f(a + \mathbf v) = f(a) + T(\mathbf v). $$
Let us restrict ourselves to endomorphisms, as we did before (so $B = A$). We can imagine "turning off" the linear-preserving part of an affine endomorphism by requiring $T: V \to V$ to be the identity $\mathbf v = T(\mathbf v)$. In this situation, setting $b = a + \mathbf v$, you can see that $f(b) - f(a) = T(b-a) = b-a$, and thus $f(a)-a = f(b) - b$. As a consequence, the map $f$ acts as a pure translation of all points in $A$ by some vector $\mathbf t := f(b)-b$; there are as many maps of this kind as there are vectors in $V$, which is why $V$ is often referred to as translation space of the affine space $A$. Symbolically, $$ V \ni \mathbf t \overset\sim\mapsto (p_\mathbf t : b \mapsto p_\mathbf t(b) = b + \mathbf t). $$
Now suppose we turn $T$ back on, and turn off its translation-preserving component, to which end it is sufficient to require that $f(a) = a$ for some $a \in A$. Then, for all $b \in A$, $$f(b) = f(a+(b-a)) = a + T(b-a),$$ and thus $f$ is completely determined by $T$; in this case, we say that $f$ is purely linear. For a given fixed point $a$, the collection of purely linear affine transformations is isomorphic to the set of linear transformations of $V$: $$\operatorname{End}(V) \ni T \overset{\sim_a}\mapsto (\ell_T^{(a)} : b \mapsto a + T(b-a)) $$ At this point it is clear that any affine transformation of $A$ can be rendered as the successive application of a purely linear and a purely translational transformation: $$\begin{split} f(b) &= f(a) + T(b-a) \\ &= a + (f(a) - a) + T(b-a) \\ &= (a + T(b-a)) + (f(a) - a) \\ &= p_{f(a)-a}\left(\ell_T^{(a)}(b) \right). \end{split}$$ Any $f \in \operatorname{End}(A)$ is completely determined by a linear transformation $T \in \operatorname{End}(V)$ and a translation vector $\mathbf t \in V$.
We finally get what Berger was saying: to understand the whole ensemble of affine endomorphisms of $A$ you need knowledge of both the collection of vector space endomorphisms of $V$ and the collection of all pure translations of $A$, which turns out to be isomorphic to $V$. (It should not come as a surprise that relaxing our original structure by "forgetting the origin" has led us to less rigid structure-preserving transformations.)
Addendum. As a side note, I should remark that the kind of intuition I've developed here is very useful when studying the symmetries of affine spaces. Just like the symmetries of a vector space $V$ are encoded in the linear group $\operatorname{Aut}(V)$ of linear automorphisms of $V$, i.e. the invertible linear endomorphisms, the symmetries of an affine space $A$ are encoded in the affine group $\operatorname{Aut}(A)$ of affine automorphisms, i.e. the invertible affine endomorphisms. Every affine automorphism is made up of a general translation (no restriction is necessary from the endomorphism case because translations are automatically invertible: the pure translation $p_{\mathbf t}$ is undone by $p_{-\mathbf t}$) and a linear automorphism, so that one may decompose the group $\operatorname{Aut}(A)$ as $$\operatorname{Aut}(A) \simeq V \rtimes \operatorname{Aut}(V), $$ where $\rtimes$ indicates the semidirect product of groups. The group operation is $$(\mathbf t,T) \circ (\mathbf s, S) = (\mathbf t+ T\mathbf s, TS). $$
This semidirect decomposition is extremely relevant to certain areas of physics (like special relativity).