What is the affine connection, and what is the intuition behind/for affine connection?
There is a lot to be said on the subject, but the least technical point of view (in my opinion) is the following:
Consider first the situation in $\mathbb{R}^n$. Let $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ be vector fields. To define the directional derivative of the vector field $X$ in the direction of the vector field $Y$ at a point $p \in \mathbb{R}^n$, we can mimic usual definition of directional derivative:
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
The result $(\nabla_Y X)$ is a vector field on $\mathbb{R}^n$. You can check that the operation $\nabla$ defined as above satisfies the following two properties:
- $\nabla_{fY}(X) = f\nabla_Y X$.
- $\nabla_Y(fX) = (Yf)X + f\nabla_YX$.
Here, $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ are vector fields and $f \colon \mathbb{R}^n \rightarrow \mathbb{R}$ is a scalar function. The function $Yf$ (at a point $p$) is the directional derivative of $f$ at $p$ in the direction $Y(p)$.
Now let us try and mimic the above construction on a general manifold. Given vector fields $X,Y \in \mathfrak{X}(M)$, we try to use the same formula and define
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
However, we see that there are two problems. First, the expression $X(p + tY(p))$ is not defined because we don't have a way of adding a point $p \in M$ to a tangent vector $tY(p) \in T_pM$. This is not so bad because we can actually replace the expression $p + tY(p)$ with any curve "which goes in the direction $Y(p)$" such as the flow $\varphi_t^Y(p)$. The more serious problem is that we need to subtract the tangent vector $X(p) \in T_pM$ from the tangent vector $X(\varphi_t^Y(p)) \in T_{\varphi_t^Y(p)}$ and those are two tangent vectors that belong to different vector spaces. In general, without any extra data, we have no way of identifying tangent spaces at different points of $M$.
To summarize, we see that we can differentiate vector fields along vector fields without any problem on $\mathbb{R}^n$ but we encounter problems when we try and do it on a general manifold. But $\mathbb{R}^n$ is also a manifold so what makes it special? The fact that it is not only a manifold but a vector space and an affine space and so we can add points to vectors and identify tangent spaces at different points using translations. This is something we don't have on a general manifold.
The definition of an affine connection is meant to supply the manifold $M$ "externally" with an operation $\nabla \colon \mathfrak{X}(M) \times \mathfrak{X}(M) \rightarrow \mathfrak{X}(M)$ which satisfies properties $(1)-(2)$ and so allows us to differentiate vector fields along vector fields. That is, instead of defining the directional derivative of a vector field along a vector field, we require that somebody handles us a mechanism $\nabla$ which satisfies the properties that the familiar derivative satisfied on $\mathbb{R}^n$ and then we will think of it as a directional derivative.
Obviously this raises quite a lot of questions. Does such mechanism always exists? (Yes). Is it unique? (No). Is there a natural choice of such differentiation mechanism? (Yes, under certain circumstances). Can we use this mechanism to recover the ability to identify tangent vectors at different points that was necessary to define the regular directional derivative in $\mathbb{R}^n$? (Yes, at least along curves. This leads to the notion of parallel transport). I refer you to the extensive article on the covariant derivative (which is pretty much another name for an affine connection) on wikipedia for further details.
Intuition come from mechanics as usual in differential geometry. Assume you are in a car moving with a law $P(t)$. in your car, there is a compass which gives you the magnetic vector field say $ \vec M$, note that this vector field is globally defined on the earth, but what you see is $ \vec M _{P(t)}$. Now, in your car you see the direction of the compass changing at every time, and you can compute ${d\over dt} \vec M _{P(t)}$. It appears that this vector only depends on the speed ${\vec V}= {d\over dt} P(t)$ you have at the instant $t$. It is written either ${D\over dt} \vec M _{P(t)}$ or $ \nabla _{\vec V}{\vec M}$. In order to prove this you can compute in coordinates, and check that this derivative is nothing else but the orthogonal projection of the usual derivative on the tangent plane. Doing this carefully you will "rediscover" Christoffel symbols, and find all properties of the affine connexion, which is nothing else but the operator which enable you to compute the derivate, called the "covariant" derivative.