Solution 1:

When I taught Number Theory I needed to speak of the derivative for a polynomial (over $\mathbb Z$ and $\mathbb Z/p \mathbb Z$). Instead of taking the derivative from $\mathbb R$ and restrict it to $\mathbb Z$, I used the following approach, which works for polynomials only (but it would work in any polynomial ring).

Let $P(X)$ be a polynomial, and $a$ a point. Then by the division Theorem we have

$$P(X)=(X-a)Q(X) +R \,.$$

where $R$ is a constant. We define

$$P'(a):= Q(a) \,. \quad (*)$$

It is important though to point that $Q(X) \neq P'(X)$ in general, since at different points we get different $Q's$.

The following Lemma is an immediate consequence of $(*)$:

Lemma

1) $(P_1 \pm P_2)' =P_1'+P_2'$,

2) $(aP)'=aP'$

3) $(a)'=0$

4) $(X^n)'=n X^{n-1}$.

Thus, one gets the general formula for the derivative of a polynomial.

The product rule can also be proven relatively easy, and then one can actually prove that

$$P(X)=P(a) + P'(a)(X-a)+ \frac{P''(a)}{2!}(x-a)^2+...+ \frac{P^{(n)}(a)}{n!}(x-a)^n \,,$$

where $n$ is the degree of the polynomial.

It also follows from here that $a$ is a multiple root of $P(X)$ with multiplicity $k$ if and only if $P(a)=P'(a)=...=P^{(k-1)}(a)=0$ and $P^{(k)}(a) \neq 0$.

This is a purely algebraic approach, it works nicely for polynomials in any rings, and can probably be easily extended to rational functions, but not much more generally.

Note that $R=P(a)$, thus for all $x \neq a$ we have $Q(X)=\frac{P(X)-P(a)}{x-a}$, thus this definition is equivalent to the standard definition in $\mathbb R$.

Also, note that $P''(a) \neq Q'(a)$ in $(*)$. Actually, from the product rule one gets $P''(a) =P'(a)+Q'(a)$.

Solution 2:

Definition:
Given a function $x(t)$, consider any point $P=(a,x(a))$ on its graph. Let the function $\ell(t)$ be a line passing through $P$. We say that $\ell$ cuts through $x$ at $P$ if there exists some real number $d>0$ such that the graph of $\ell$ is on one side of the graph of $x$ for all $a-d < t < a$, and is on the other side for all $a < t < a+d$.

Definition (Marsden):
A line $\ell$ through $P$ is said to be the line tangent to $x$ at $P$ if all lines through $P$ with slopes less than that of $\ell$ cut through $x$ in one direction, while all lines with slopes greater than $P$'s cut through it in the opposite direction.

Definition:
The derivative of a function is the slope of its tangent line at a given point.

Theorem (Livshits):
The derivative of $t^k$ is $kt^{k-1}$, for $k=1, 2, 3, \ldots$

It suffices to prove that the derivative equals $k$ when evaluated at $t=0$ and $1$. The result at $t=0$ holds for even $n$ by symmetry, and for odd $n$ by application of the definition.

It remains to prove the result at $t=1$. The proposed tangent line at $(1,1)$ has the equation $\ell(t)=k(t-1)+1$, so what we need to prove is that the polynomial $t^k-[k(t-1)+1]$ is greater than or equal to zero throughout some region around $t=1$. We will prove that it is $\ge 0$ for $t \ge 0$.

Suppose that $\ell$ crosses $t^k$ at the point $(t,t^k)$. Then the slope of $\ell(t)$ is $k$, so we must have \begin{equation*} \frac{t^k-1}{t-1} = k. \end{equation*} The left-hand side is given by $Q(t)=\sum_{j=0}^{k-1}t^j$. Where do we get $Q(t)=k$? Clearly we have a solution for $t=1$, since there are $k$ terms, each equal to $1$. For $t>1$, all the terms except the constant one are greater than $1$, so there can't be any solution. For $0 \le t < 1$, all the terms except the constant one are positive and less than $1$, so again there can't be any solution. This completes the proof.

References

  • Jerrold Marsden and Alan Weinstein, Calculus Unlimited
  • Michael Livshits, Calculus (was http://mathfoolery.org/calculus.html)

Solution 3:

Start with polynomials only. Given a polynomial $p(x) = \sum_{i=0}^n a_ix^i$ and a point $x_0$, assure yourself that there is another polynomial $\tilde{p}(x) = \sum_{i=0}^n b_ix^i$ such that $$ p(x) = \tilde{p}(x-x_0). $$

Now observe what happens if you evaluate $\tilde{p}(x-x_0)$ for an $x$ that lies close to $x_0$. $(x-x_0)^i$ will then decay very rapidly, so $\tilde{p}(x-x_0)$ won't differ very much from $b_0 + b_1(x-x_0)$. In other words, $b_0 + b_1(x-x_0)$ is a good approximation of $p$ as long as we don't stray too far from $x_0$. Now, we just have to actually find $b_0$ and $b_1$.

$b_0$ is obviously just $p(x_0)$, so what remains is to find $b_1$, i.e. the coefficient of $x$ in $p(x+x_0)$. Once you realize that that expanding $(x+c)^k$ produces $k$ times the term $xc$ and that no other term contains exactly one $x$, it is clear that $$ b_1 = a_1 + a_22x_0 + a_33x_0^2 + a_44x_0^3 + \ldots $$

The first-order approximation of $p$ around $x_0$ is thus $p(x_0) + x\sum_{i=1}^n a_iix_0^{i-1}$ which makes it obvious that the slope of $p(x)$ around $x_0$ is $$ p'(x) = \sum_{i=1}^n a_iix_0^{i-1} $$

The key ingrediate (and the replacement for explicit limits) is the idea that for small values $\epsilon$, $\epsilon^2$ and higher power are sufficiently close to zero to be ignored.