Metric independent definition of the derivative
In the wikipedia article on the exterior derivative it says:
"[The exterior derivative] allows for a natural, metric-independent generalization of Stokes' theorem, Gauss's theorem, and Green's theorem from vector calculus. "
I'm confused about the "metric-independent" part. Don't we necessarily need a metric to define an $\epsilon, \delta$-limit when defining the most basic of derivatives, the one for $(\Bbb R \to \Bbb R)$ ? Since multivariate derivatives (differential, parametric derivative, Jacobian) are built on vectors/matrices of partial derivatives (which is just a recycling of the real derivative), doesn't that imply that the exterior derivative should also be dependent on the choice of a metric ? Is there some more fundamental (eg, topological) construction of the derivative that I'm missing ? Am I misunderstanding the exterior derivative itself ?
I'm thinking that maybe this idea of "metric-independence" simply mean "no matter the metric $d$, so long as we define a fixed $d$-derivative based on that metric, the integral theorems of calculus that rely on this definition of the $d$-derivative will always express the generalized Stokes' theorem in the same way, respective to their $d$-derivative". Is this the case ? If so, how does this metric-independent expression of the GST arise (if that's a complex question, feel free to leave it out, though some intuition is very welcome) ?
You are right that one uses notions of "distance" in the definition of differentiation. However the point is that many different metrics will give you the same notion of differentiation, so once you have defined differentiation, you can throw away the metric.
Thus if you have a smooth function $f$ on some neighborhood $U\subseteq\mathbb{R}^n$, its derivative $df$ gives a linear map $df\colon\mathbb{R}^n\to \mathbb{R}$ (or equivalently a covector) at each point. Given a vector in $\mathbb{R}^n$, $df$ will tell you how fast $f$ is changing along that vector, at the point.
If $V\subseteq\mathbb{R}^n$ and $h:V\to U$ is a smooth homeomorphism, then $V$ gives us a different parameterisation of the points in $U$. The map $h$ need not preserve the metric. However the function $f$ may be defined on $V$ as $fh$.
Thus we can calculate the derivative of $f$ with respect to the new co-ordinate system: $$d(fh)=df\circ dh,$$ by the chain rule. That is $df$ and $d(fh)$ are the same, as long as we identify the domains via the linear isomorphism $dh$.
The moral of this is that you can differentiate a function, and switch to a completely different co-ordinate system, which does not preserve the metric, and get the "same" derivative.
This is not true for all notions of derivative. Try repeating the above trick with the derivative of a vector field, and you will not get the same derivative when you switch co-ordinate systems - you will get an additional term. To define the derivative of a vector field in a co-ordinate independent way, you need to take into account the metric. Different metric's will give you different derivatives.
In general, the exterior derivative (which generalises the notions of Div, Grad and Curl: $\nabla, \nabla\cdot,\nabla\wedge$) is well defined under change of co-ordinate, regardless of whether the metric is preserved. Therefore these derivatives are well defined without a metric. We then have Stoke's theorem: $$\int_X dw=\int_{\partial X} w$$
Warning When regarded as an exterior derivative, $\nabla$ takes functions to $1$-forms (not vector fields), $\nabla\cdot$ takes $(n-1)$-forms to $n$-forms (not vector fields to scalar fields), and $\nabla\wedge$ takes $(n-2)$-forms to $(n-1)$-forms (not vector fields to vector fields).
However to define other derivatives, like the derivative of a vector field, one needs a metric (or at least a connection).
Hopefully this conveys the idea of what is going on. Any text on differential geometry will elaborate the details.
EDIT: (What goes wrong trying to differentiate a vector field, without a metric.)
To appreciate how remarkable it is that the exterior derivative is well defined without a metric, it is worth trying to naively differentiate a vector field and see what goes wrong. The fact this does not happen with the exterior derivative is due to a lot of cancellation!
Let $v_j\frac{\partial}{\partial x_j}$ (summation convention throughout) be a vector field in one coordinate system, so it can be written as $v_j\frac{\partial y_k}{\partial x_j}\frac{\partial}{\partial y_k}$ in another coordinate system. If we differentiate this in the original co-ordinate system we get: $$ \frac{\partial v_j}{\partial x_i}\frac{\partial}{\partial x_j}dx_i$$ In the new coordinate system this translates to:\begin{eqnarray*}&\,\,& \frac{\partial v_j}{\partial x_i}\left(\frac{\partial y_k}{\partial x_j}\frac{\partial}{\partial y_k}\right)\left(\frac{\partial x_i}{\partial y_l}dy_l\right)\\&=&\frac{\partial v_j}{\partial y_l}\frac{\partial y_k}{\partial x_j}\frac{\partial }{\partial y_k}dy_l\end{eqnarray*}
If we differentiate in the new coordinate system we get: \begin{eqnarray*}&\,\,& \frac{\partial }{\partial y_l}\left(v_j\frac{\partial y_k}{\partial x_j}\right)\frac{\partial}{\partial y_k}dy_l\\&=& \frac{\partial v_j}{\partial y_l}\frac{\partial y_k}{\partial x_j}\frac{\partial }{\partial y_k}dy_l + v_j\frac{\partial }{\partial y_l}\left(\frac{\partial y_k}{\partial x_j}\right)\frac{\partial }{\partial y_k}dy_l\\&=& \frac{\partial v_j}{\partial y_l}\frac{\partial y_k}{\partial x_j}\frac{\partial }{\partial y_k}dy_l - \frac{\partial y_k}{\partial x_j}\frac{\partial^2 x_j}{\partial y_l \partial y_m}\frac{\partial }{\partial y_k}dy_l dy_m(v) \end{eqnarray*}
So the difference is a symmetric bilinear map on the tangent space: $$\frac{\partial y_k}{\partial x_j}\frac{\partial^2 x_j}{\partial y_l \partial y_m}\frac{\partial }{\partial y_k}dy_l dy_m$$ Notice this is linear in $v$ - that is it does not involve any derivatives of the $v_j$.
On the other hand, if we had a metric we could add another symmetric bilinear map on the tangent space, coming from the metric (called the Levi-Civita connection) to the naive derivative. Then when we change coordinate systems, the difference between the symmetric bilinear maps that we added in the two coordinate systems, exactly cancels the difference between the naive derivatives.