Appearance of Formal Derivative in Algebra
Rather than being a deep surprise, the formal derivative is somewhat closer to being a triviality that is easy to overlook!
For example, the MacLaurin series for a polynomial:
$$ f(x) = \sum_{i=0}^\infty f_i\: x^i $$
is simply writing out $f(x)$ in the usual fashion as the sum of its terms. It's also clear that every polynomial has a finite Taylor series around any point:
$$ f(x) = \sum_{i=0}^\infty c_i \: (x-a)^i$$
and we can consider all points at once by letting $a$ be a variable. Then the coefficients are now functions of $a$:
$$ f(x) = \sum_{i=0}^\infty c_i(a) \: (x-a)^i$$
in fact, it's easy to work out that they are polynomials in $a$. Going back to the original example, it's now clear that those $a$ such that $f(x)$ has a double root at $a$ are precisely the roots of $\gcd(c_0(x), c_1(x))$.
Of course, we know that $c_0(a) = f(a)$ and $c_1(a) = f'(a)$ and so forth; the derivatives of $f$ can effectively by defined by this equation. In fact, a common algebraic definition of the formal derivative is that it is the unique polynomial such that
$$ f(x + \epsilon) = f(x) + \epsilon f'(x) \pmod{\epsilon^2} $$
where we consider $\epsilon$ a variable. The ring $F[\epsilon] / \epsilon^2$, incidentally, is called the ring of dual numbers.
This should look an awful lot like asymptotic notation for the first-order behavior of $f$ at $0$....
A posteriori we know that this point of view turns out to be extremely fruitful (not just in analysis, but even purely algebraically), and pushed to even greater extremes -- e.g. formal power series rings. And it generalizes to rings other than $F[x]$ where we consider things modulo powers of an ideal, which in turn leads to things like the $p$-adic numbers.
But without this knowledge, one might look at all I've said above, and think that all I've done is take a simple thing and make things complicated.
One might even be so bold to argue that the derivative is actually more of an algebraic idea that has been fruitfully generalized to the study of analytic objects rather than the other way around.
There was a abstract-algebra tag, so this idea by Conor McBride might be of interest to you.
To put it shortly, you can find a meaning for formal derivative in discrete part of mathematics.
Introduction.
Let $A$, $B$, $C$, etc. be some types, and then $A\times B$ be the product type and $A + B$ be the coproduct. Moreover, let $B^A$ be the type of functions from $A$ to $B$. (To gain some intuition, you can think of $A$, $B$, etc., as subsets of some universe $U$, $A\times B$ would be then the Cartesian product and $A + B$ would be the disjoint union.)
Observe, that $|A+B| = |A| + |B|$, $|A\times B| = |A|\cdot|B|$, and $|B^A| = |B|^{|A|}$. Denote as
- $\mathbf{0}$, the uninhibited type (something like an empty set),
- $\mathbf{1}$, a type with one element (something like a singleton),
- $\mathbf{2}$, a type with two elements,
- etc.
Then, $|A\times A| = |A^\mathbf{2}|$, and of course there is an isomorphism between pairs of type $A$ and functions $\mathbf{2} \to A$ (indeed application of the first element of $\mathbf{2}$ would be a $\pi_1$ projection). Many more identities are working, e.g. distributive law
$$(A+B)\times C \equiv A\times C + B \times C,$$
and the well-known law of exponentiation becomes the Currying operation
$$(A^B)^C \equiv A^{B\times C}.$$
Data structures.
Data structures are functions on types. An example of non-trivial data structure is the list. It can be defined recursively as
$$L[X] = \mathbf{1} + X \times L[X], \tag{1}$$
that is, the list is an empty list or it contains the head (which is of the parameter type) and the tail (which is again, a list). Solving $(1)$ for $L[X]$ we get that
$L[X] \equiv \frac{\mathbf{1}}{\mathbf{1}-X} = \mathbf{1} + X + X^2 + X^3 + \ldots$.
In other words, the list is isomorphic to a type that is a singleton, or a 1-tuple, or 2-tuple, etc. Please note, that my description is rather informal, however, it would take ages to introduce proper definitions.
Formal derivative.
Now, the surprise: the formal derivation is similar to removing an element form the data structure in question (which is represented by some function).
For example, $(A^3)' \equiv 3\times A^2$, i.e. if we remove one $A$ from 3-tuple we get a 2-tuple and an indicator, where the element was taken from (there were three places possible).
$$L'[A] \equiv \left(\frac{1}{1-A}\right)' \equiv \left(\frac{1}{1-A}\right)^2 \equiv L[A]\times L[A],$$
in other words, a list without an element becomes two lists, that is the elements before the taken, and the rest, after the taken. The usual laws also work, e.g.
$$\left(F[G[X]]\right)' \equiv F'[G[X]]\times G'[X],$$ that is, if we want to remove $X$ from a nested structure $F[G[X]]$, the we need first to remove some $G[X]$ from $F$, and then from this $G[X]$ remove $X$.
Finally, the Taylor expansion also works, however, I will leave out the interpretation as to not spoil all the fun ;-)
$$F[X+Y] \equiv F[X] + F'[X]\times Y + F''[X]\times \frac{Y^2}{2} + F'''[X]\times\frac{Y^3}{3!} + \ldots$$
I hope I made you curious ;-)
The formal derivative in a polynomial ring is just one example of a more general concept called a derivation. More explicitly, if $R, S$ are rings with $S$ an $R$ algebra, and $d : R \rightarrow S$ is a map which satisfies:
1) $d(a + b) = d(a) + d(b)$
2) $d(ab) = a d(b) + b d(a)$
Then $d$ is called a derivation. That formal derivative in a polynomial ring is just one example of a derivation, but there are many derivations that pop up all over the place in algebra and geometry.
In the general context of algebra, there is a big theory called differential galois theory, which studies differential field extensions (extensions of fields which have a derivation, such as $\mathbb{C}(x)$ for example).
In differential geometry, a smooth manifold has a derivation on its set of real-valued smooth functions $C^\infty(M)$, which is basically the usual derivative. The target of this derivation is not the set of functions, but the set of differential 1-forms $\Lambda^1(M)$.
These are just a couple of example, but derivations are important because of their algebraic properties, and because derivations show up naturally in many branches of mathematics. I can't speak for Algebraic Geometry, but I believe derivations play an important role in that theory as well.
One other place where it comes up is in the theory of formal languages. I believe it was Janusz Brzozowski who first noticed that regular expressions (in particular) have a useful derivative.
If you know anything about regular expressions, then the following will make sense, otherwise it won't. Nonetheless, here it is: A regular expression is an idempotent semi-ring with an additional operation called the Kleene star.
Intuitively, a regular expression represents a set of words over some alphabet. $0$ is the empty set, $1$ is a singleton set containing the zero-length word, multiplication is word concatenation, and addition is set union. The Kleene star represents a certain kind of iteration. Intuitively:
$$A^* = 1 + A + A^2 + A^3 + \cdots$$
It's possible to define this in terms of simple equality axioms, but we're not going to do that here.
So we have an idempotent semiring (idempotent because $A + A = A$), plus the Kleene star. The Kleene star plays the role of $e^x$ (if this seems weird, think of the power series of $e^x$, plus the fact that addition is idempotent). The letters in the underlying alphabet behave like variables. In particular, we can define evaluation at zero:
$$a(0) = 0$$ $$(AB)(0) = A(0) B(0)$$ $$(A+B)(0) = A(0) + B(0)$$ $$A^*(0) = 1$$
Given a regular expression $E$, $E(0)$ is either $0$ or $1$. It is $1$ if the empty string is a member of $E$, and $0$ otherwise.
We can also define a derivative:
$$\frac{\partial a}{\partial a} = 1$$ $$\frac{\partial b}{\partial a} = 0$$ $$\frac{\partial (A+B)}{\partial a} = \frac{\partial A}{\partial a} + \frac{\partial B}{\partial a}$$ $$\frac{\partial AB}{\partial a} = A(0) \frac{\partial B}{\partial a} + \frac{\partial A}{\partial a} B$$ $$\frac{\partial A^*}{\partial a} = \frac{\partial A}{\partial a} A^*$$
The only odd rule here is the one for multiplication. It's almost like the familiar product rule; the difference is due to the fact that concatenation is non-commutative.
What the derivative intuitively means is that $\frac{\partial E}{\partial a}$ is the set of strings in $E$ which start with the symbol $a$, but with that $a$ removed. So $a \frac{\partial E}{\partial a}$ is the set of strings in $E$ which start with $a$.
Thinking about it for a moment, if ${a\ldots z}$ is the alphabet, then:
$$E = E(0) + a \frac{\partial E}{\partial a} + b \frac{\partial E}{\partial b} + \cdots + z \frac{\partial E}{\partial z}$$
This is Taylor's theorem, only for regular languages. Moreover, it is also a rule for creating DFAs directly from regular expressions! $E(0)$ is $1$ if and only if the initial state is a final state, and the other terms are the transitions.
One remarkable thing about this is that the familiar regular expression operators (plus some less familiar ones, such as set intersection and set difference) are completely determined by their derivatives, plus their evaluation at zero. This is what we'd expect from the fundamental theorem of calculus, but it's interesting to see it here too.
Incidentally, this theory scales up to context-free and recursive languages too, but you need a bit more machinery for that which I won't go into here.