Understanding the notion of a connection and covariant derivative

I have been reading Nakahara's book "Geometry, Topology & Physics" with the aim of teaching myself some differential geometry. Unfortunately I've gotten a little stuck on the notion of a connection and how it relates to the covariant derivative.

As I understand it a connection $\nabla :\mathcal{X}(M)\times\mathcal{X}(M)\rightarrow\mathcal{X}(M)$, where $\mathcal{X}(M)$ is the set of tangent vector fields over a manifold $M$, is defined such that given two vector fields $X,V\in\mathcal{X}(M)$ then $\nabla :(X,V)\mapsto\nabla_{X}V$. The connection enables one to "connect" neighbouring tangent spaces such that one can meaningfully compare vectors in the two tangent spaces.

What confuses me is that Nakahara states that this is in some sense the correct generalisation of a directional derivative and that we identify the quantity $\nabla_{X}V$ with the covariant derivative, but what makes this a derivative of a vector field? In what sense is the connection enabling one to compare the vector field at two different points on the manifold (surely required in order to define its derivative), when the mapping is from the (Cartesian product of) the set of tangent vector fields to itself? I thought that the connection $\nabla$ "connected" two neighbouring tangent spaces through the notion of parallel transport in which on transports a vector field along a chosen curve, $\gamma :(a,b)\rightarrow M$, in the manifold connecting the two tangent spaces.

Given this, what does the quantity $\nabla_{e_{\mu}}e_{\nu}\equiv\nabla_{\mu}e_{\nu}=\Gamma_{\mu\nu}^{\lambda}e_{\lambda}$ represent? ($e_{\mu}$ and $e_{\nu}$ are coordinate basis vectors in a given tangent space $T_{p}M$ at a point $p\in M$) I get that since $e_{\mu},e_{\nu}\in T_{p}M$, then $\nabla_{\mu}e_{\nu}\in T_{p}M$ and so can be expanded in terms of the coordinate basis of $T_{p}M$, but I don't really understand what it represents?!

Apologies for the long-windedness of this post but I've really confused myself over this notion and really want to clear up my understanding.

In what sense is the connection enabling one to compare the vector field at two different points on the manifold [...], when the mapping is from the (Cartesian product of) the set of tangent vector fields to itself? I thought that the connection ∇ "connected" two neighbouring tangent spaces through the notion of parallel transport [...]

To see a connection only as a mapping $\nabla: \mathcal{X}(M)\times\mathcal{X}(M)\rightarrow\mathcal{X}(M)$ is too restrictive. Often a connection is also seen as a map $Y\mapsto\nabla Y\in\Gamma(TM\otimes TM^*)$, which highlights the derivative aspect. However, the important point is that $\nabla$ is $C^\infty(M)$-linear in the first argument which results in the fact that the value $\nabla_X Y|_p$ only depends on $X_p$ in the sense that $$ X_p=Z_p \Rightarrow \nabla_X Y|_p = \nabla_Z Y|_p. $$ Hence, for every $v\in TM_p$, $\nabla_vY$ is well-defined. This leads directly to the definition of parallel vector fields and parallel transport (as I think you already know).

Vice versa, given parallel transport maps $\Gamma(\gamma)^t_s: TM_{\gamma(s)}\rightarrow TM_{\gamma(t)}$, one can recover the connection via $$ \nabla_X Y|p = \frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} \quad(\gamma \text{ is a integral curve of }X). $$ This is exactly the generalisation of directional derivatives in the sense that we vary $Y$ in direction of $X_p$ in a parallel manner. In Euclidean space this indeed reduces to the directional derivative: Using the identity chart every vector field can be written as $Y_p=(p,V(p))$ for $V:\mathbb R^n\rightarrow \mathbb R^n$ and the parallel transport is just given by $$ \Gamma(\gamma)_s^t (\gamma(s),v)=(\gamma(t),v). $$ Hence, we find in Euclidean space: $$ \frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} = \frac{d}{dt}\bigg|_{t=0}(p,V(\gamma(t))) = (p,DV\cdot\gamma'(0)), $$ which is exactly the directional derivative of $V$ in direction $v=\gamma'(0)$.

Back to the original question: I think it is hard to see how a connection "connects neighbouring tangent spaces" only from the axioms. You should keep in mind, however, that the contemporary formalism has passed many abstraction layers since the beginning and is reduced to its core, the axioms (for a survey see also Wikipedia). To get the whole picture, it is essential that one explores all possible interpretations and consequences of the definition, since often they led to the definition in the first place. In my opinion, the connection is defined as it is with the image in mind that it is an infinitesimal version of parallel transport. Starting from this point, properties as the Leibniz rule are a consequence. However, having such a differential operator $\nabla$ fulfilling linearity, Leibniz rule and so on, is fully equivalent to having parallel transport in the first place. In modern mathematics, these properties are thus taken as the defining properties/axioms of a connection, mainly because they are easier to handle and easier to generalise to arbitrary vector bundles.

Given this, what does the quantity $\nabla_{e_\mu}e_\nu=\Gamma^\lambda_{\mu\nu}e_\lambda$ represent? [...]

As you wrote, the connection coefficients / Christoffel symbols $\Gamma^\lambda_{\mu\nu}$ are the components of the connection in a local frame and are needed for explicit computations. I think on this level you can't get much meaning out these coefficients. However, they reappear in a nicer way if you restate everything in the Cartan formalism and study Cartan and/or principal connections. The Wikipedia article on connection forms tries to give an introduction to this approach.

Nahakara also gives an introduction to connections on principal bundles and the relation to gauge theory later on in his book. In my opinion, this chapter is a bit short and could be more detailed, especially to the end. But it is a good start.

Why is $\arccos(-\frac 13)$ the optimal angle between bonds in a methane ($\rm CH_4$) molecule?

Can the inscribed angle theorem be generalized to solid angles in 3D? And beyond to n-dimensional space?

What is the most efficient algorithm for factorisation when an approximate value of one factor is known

$\sum \frac{a_n}{\ln a_n}$ converges $\implies \sum \frac{a_n}{\ln (1+n)}$ converges

Does the Lie derivative commute with $\partial$?

Find a matrix with determinant equals to $\det{(A)}\det{(D)}-\det{(B)}\det{(C)}$

Complementary text for mathematical Quantum Mechanics lectures

For which topological measure spaces do open sets always have positive measure?

"Spreading out" a smooth, connected $\mathbb{C}$-scheme of finite type.

Manifold without point homotopy equivalent to wedge of $2$-spheres.

Pascal's Triangle and Binary Representations

Line through the origin mod $1$ visits every sub-cube in $\mathbb{R}^n$.