Why do differentiation rules work? What's the intuition behind them? (Not asking for proofs)
Solution 1:
The key intuition, first of all, is that the product of two tiny differences is negligible. You can intuit this just by doing computations:
$$3.000001 \cdot 2.0001 = 6.0003020001$$
If we are doing any sort of rounding of hand computations, we'd likely round away that $0.0000000001$ part. If you were doing computations to eight significant digits, a value $v$ is really a value in a range roughly of $v\left(1 \pm 10^{-8}\right)$ and the error when you multiply $v_1$ by $v_2$ is almost entirely $10^{-8}|v_1v_2|$. The other part of the error is so tiny you'd probably ignore it.
Case: $f(x)=x^2$
Now, consider a square with corners $(0,0), (0,x), (x,0), (x,x)$. Grow $x$ a little bit, and you see the area grows by proportionally by the size of two of the edges, plus a tiny little square. That tiny square is negligible.
This is a little harder to visualize for $x^n$, but it actually works the same way when $n$ is a positive integer, by considering an $n$-dimensional hypercube.
This geometric reason is also why the circumference of a circle is equal to the derivative of its area – if you increase the radius a little, the area is increased by approximately that "little" times the circumference. So the derivative of $\pi r^2$ is the circumference of the circle, $2\pi r$.
It's also a way to understand the product rule. (Or, indeed, FOIL.)
Case: The chain rule
The chain rule is better seen by considering an odd-shaped tub. Let's say that when the volume of the water in a tube is $v$ then the tub is filled to depth $h(v)$. Then assume that we have a hose that, between time $0$ and time $t$, has sent a volume of $v(t)$ water.
At time $t$, what is the rate that the height of the water is increasing?
Well, we know that when the current volume is $v$, then the rate at which the height is increasing is $h'(v)$ times the rate the volume is increasing. And the rate the volume is increasing is $v'(t)$. So the rate the height is increasing is $h'(v(t)) \cdot v'(t)$.
Case: Inverse function
This is the one case where it is obvious from the graph. When you flip the coordinates of a Cartesian plane, a line of slope $m$ gets sent to a line of slope $1/m$. So if $f$ and $g$ are inverse functions, then the slope of $f$ at $(x,f(x))$ is the inverse of the slope of $g$ at $(f(x),x)=(f(x),g(f(x)))$. So $g'(f(x))=1/f'(x)$.
$x^2$ revisited
Another way of dealing with $f(x)=x^2$ is thinking again of area, but thinking of it in terms of units. If we have a square that is $x$ centimeters, and we change that by a small amount, $\Delta x$ centimeters, then the area is $x^2\mathrm{cm}^2$ and it goes to approximately $f(x+\Delta x)-f(x)=f'(x)\Delta x$.
On the other hand, if we measure the square in meters, it has side length $x/100$ meters and area $(x/100)^2$. The change in the side length is $(\Delta x)/100$ meters. So the expected area change is $f'(x/100)\cdot (\Delta x)/100$ square meters. But this difference should be the same, so $$f'(x)\Delta x = f'(x/100)\cdot\frac{\Delta x}{100}\cdot \left(100^2 \text{m}^2/\text{cm}^2\right) = 100 f'(x/100)$$
More generally, then, we see that $f'(ax)=af'(x)$ when $f(x)=x^2$ by changing units from centimeters to a unit that is $1/a$ centimeters.
So we see that $f'(x)$ is linear, although it doesn't explain why $f'(1)=2$.
If you do the same for $f(x)=x^n$, with units $\mu$ and another unit $\rho$ where $a\rho = \mu$, then you get that the a change in volume when changing by $\Delta x\,\mu$ is $f'(x)\Delta x\,\mu^n$. It is also $f'(ax)\cdot a(\Delta x)\,\rho^n$. Since $\mu/\rho = a$, this means $f'(ax) =a^{n-1}f'(x)$.
Again, we still don't know why $f'(1)=n$, but we know $f'(x)=f'(1)x^{n-1}$.
Solution 2:
For the first hundred years or so, before people formalized differentiation and integration by using limits, the general intuition behind taking the derivative of $f(x)$ was, "Let's add a tiny increment to $x$ and see how much $f(x)$ changes."
The "tiny increment" was called $o$ (lower-case letter O), at least by some people.
For $f(x) = x^2$, for example, you could show that $$f(x + o) = (x + o)^2 = x^2 + 2xo + o^2 = f(x) + 2xo + o^2.$$ So the amount of "change" in $f(x)$ is $2xo + o^2$, which is $2x + o$ times the amount by which you changed $x$. And then the mathematicians would say that only the $2x$ part of $2x + o$ matters, since $o$ is "vanishingly" small.
I think for most of the differentiation rules developed back then (which may be all you'll see in the table of derivatives in an elementary calculus book), the intuition was to do the arithmetic. What they did not do was to encumber that arithmetic with all the extra mechanisms needed to establish a limit, as the standard-analysis approach does today.
On the other hand, the arithmetic usually went hand-in-hand with practical problems (usually in what we would consider physics or engineering) that people wanted to solve. People also tended to make a connection between arithmetic and geometry, so linking the function $f(x) = x^2$ to the area of a square of side $x$ would have been an obvious thing to do (and the visualization in Thomas Andrews's answer would have worked very well, I think).
For example, visualize a particle running along a circular track at a constant speed. In fact, make the circular track be the circle given by $x^2 + y^2 = 1$ in the Cartesian plane. (Putting everything into Cartesian coordinates was all the rage when calculus was young.) You can then see (by symmetry, or by other arguments) that the direction the particle is going is always perpendicular to the direction in which the particle lies from the center of the circle at that moment. So if the angle to the particle at that instant is $\theta$, the $x$-coordinate of the particle is $\sin\theta$, but the velocity vector is pointing in a direction $\frac\pi2$ radians "ahead" of $\theta$, and if we let $\theta$ increase at the rate of $1$ radian per unit of time the magnitude of the velocity is $1$, so its $x$-coordinate is $\sin\left(\theta + \frac\pi2\right) = \cos\theta$, which is the derivative of $\sin\theta$ when $\theta$ is measured in radians.