Defining the derivative without limits
Solution 1:
When I taught Number Theory I needed to speak of the derivative for a polynomial (over $\mathbb Z$ and $\mathbb Z/p \mathbb Z$). Instead of taking the derivative from $\mathbb R$ and restrict it to $\mathbb Z$, I used the following approach, which works for polynomials only (but it would work in any polynomial ring).
Let $P(X)$ be a polynomial, and $a$ a point. Then by the division Theorem we have
$$P(X)=(X-a)Q(X) +R \,.$$
where $R$ is a constant. We define
$$P'(a):= Q(a) \,. \quad (*)$$
It is important though to point that $Q(X) \neq P'(X)$ in general, since at different points we get different $Q's$.
The following Lemma is an immediate consequence of $(*)$:
Lemma
1) $(P_1 \pm P_2)' =P_1'+P_2'$,
2) $(aP)'=aP'$
3) $(a)'=0$
4) $(X^n)'=n X^{n-1}$.
Thus, one gets the general formula for the derivative of a polynomial.
The product rule can also be proven relatively easy, and then one can actually prove that
$$P(X)=P(a) + P'(a)(X-a)+ \frac{P''(a)}{2!}(x-a)^2+...+ \frac{P^{(n)}(a)}{n!}(x-a)^n \,,$$
where $n$ is the degree of the polynomial.
It also follows from here that $a$ is a multiple root of $P(X)$ with multiplicity $k$ if and only if $P(a)=P'(a)=...=P^{(k-1)}(a)=0$ and $P^{(k)}(a) \neq 0$.
This is a purely algebraic approach, it works nicely for polynomials in any rings, and can probably be easily extended to rational functions, but not much more generally.
Note that $R=P(a)$, thus for all $x \neq a$ we have $Q(X)=\frac{P(X)-P(a)}{x-a}$, thus this definition is equivalent to the standard definition in $\mathbb R$.
Also, note that $P''(a) \neq Q'(a)$ in $(*)$. Actually, from the product rule one gets $P''(a) =P'(a)+Q'(a)$.
Solution 2:
Definition:
Given a function $x(t)$, consider any point $P=(a,x(a))$ on its graph.
Let the function $\ell(t)$ be a line passing through $P$.
We say that $\ell$ cuts through $x$ at $P$ if
there exists some real number $d>0$ such that
the graph of $\ell$ is on one side of the graph of $x$ for
all $a-d < t < a$, and is on the other side for all $a < t < a+d$.
Definition (Marsden):
A line $\ell$ through $P$ is said to be
the line tangent to $x$ at $P$ if all
lines through $P$ with slopes less than that of $\ell$ cut through $x$
in one direction, while all lines with slopes greater than $P$'s cut through
it in the opposite direction.
Definition:
The derivative of a function is the slope of its tangent line at a given point.
Theorem (Livshits):
The derivative of $t^k$ is $kt^{k-1}$, for $k=1, 2, 3, \ldots$
It suffices to prove that the derivative equals $k$ when evaluated at $t=0$ and $1$. The result at $t=0$ holds for even $n$ by symmetry, and for odd $n$ by application of the definition.
It remains to prove the result at $t=1$. The proposed tangent line at $(1,1)$ has the equation $\ell(t)=k(t-1)+1$, so what we need to prove is that the polynomial $t^k-[k(t-1)+1]$ is greater than or equal to zero throughout some region around $t=1$. We will prove that it is $\ge 0$ for $t \ge 0$.
Suppose that $\ell$ crosses $t^k$ at the point $(t,t^k)$. Then the slope of $\ell(t)$ is $k$, so we must have \begin{equation*} \frac{t^k-1}{t-1} = k. \end{equation*} The left-hand side is given by $Q(t)=\sum_{j=0}^{k-1}t^j$. Where do we get $Q(t)=k$? Clearly we have a solution for $t=1$, since there are $k$ terms, each equal to $1$. For $t>1$, all the terms except the constant one are greater than $1$, so there can't be any solution. For $0 \le t < 1$, all the terms except the constant one are positive and less than $1$, so again there can't be any solution. This completes the proof.
References
- Jerrold Marsden and Alan Weinstein, Calculus Unlimited
- Michael Livshits, Calculus (was http://mathfoolery.org/calculus.html)
Solution 3:
Start with polynomials only. Given a polynomial $p(x) = \sum_{i=0}^n a_ix^i$ and a point $x_0$, assure yourself that there is another polynomial $\tilde{p}(x) = \sum_{i=0}^n b_ix^i$ such that $$ p(x) = \tilde{p}(x-x_0). $$
Now observe what happens if you evaluate $\tilde{p}(x-x_0)$ for an $x$ that lies close to $x_0$. $(x-x_0)^i$ will then decay very rapidly, so $\tilde{p}(x-x_0)$ won't differ very much from $b_0 + b_1(x-x_0)$. In other words, $b_0 + b_1(x-x_0)$ is a good approximation of $p$ as long as we don't stray too far from $x_0$. Now, we just have to actually find $b_0$ and $b_1$.
$b_0$ is obviously just $p(x_0)$, so what remains is to find $b_1$, i.e. the coefficient of $x$ in $p(x+x_0)$. Once you realize that that expanding $(x+c)^k$ produces $k$ times the term $xc$ and that no other term contains exactly one $x$, it is clear that $$ b_1 = a_1 + a_22x_0 + a_33x_0^2 + a_44x_0^3 + \ldots $$
The first-order approximation of $p$ around $x_0$ is thus $p(x_0) + x\sum_{i=1}^n a_iix_0^{i-1}$ which makes it obvious that the slope of $p(x)$ around $x_0$ is $$ p'(x) = \sum_{i=1}^n a_iix_0^{i-1} $$
The key ingrediate (and the replacement for explicit limits) is the idea that for small values $\epsilon$, $\epsilon^2$ and higher power are sufficiently close to zero to be ignored.