A proof of the Isoperimetric Inequality - how does it work?

My attempt at an answer has two parts.

First, I think the geometric mean thing slightly obscures what's going on and contributes to your bafflement why "the $r$'s magically fall out of the equation in the last step". A more geometric view of this step is: We know that the curve has "width" $w=2r$ (along the chosen direction). Then $A = w\overline{h} = 2r\overline{h}$, where $\overline{h}$ is the "average height" of the curve (along the chosen direction). Thus we have

\[A + \pi r^2 \le lr \;\Leftrightarrow\; 2r\overline{h} + \pi r^2 \le lr \;\Leftrightarrow\; 2\overline{h} + \pi r \le l \;\Leftrightarrow\; 2\overline{h} + \frac{\pi}{2} w \le l\;.\]

Now the cancellation of the $r$'s seems more natural, and the inequality gives an upper bound for the "average height" we can achieve for a given "width" $w$ and length $l$. This bound implies the isomperimetric inequality:

\[l^2 \ge (2\overline{h} + \frac{\pi}{2} w)^2=(2\overline{h} - \frac{\pi}{2} w)^2 + 4\pi w\overline{h} \ge 4\pi w\overline{h}=4\pi A\;.\]

(There may be a connection here to the inequality by Bonnesen cited in Christian Blatter's comment.)

Second, regarding the proof as a whole, it seems useful to think of it as a way of transforming the difficult global optimization problem implied by the isoperimetric inequality (how to enclose the greatest possible area within a given circumference) into a trivial local optimization problem through some clever bookkeeping. In a sense, what makes the problem difficult is that how much area you can enclose with a curve element (say, with respect to the origin) depends both on where you are and in which direction you move, but in which direction you move in turn determines where you will end up, and hence how much area you will be able to enclose later.

The proof decouples this by adding to the area element a suitable penalty which has two crucial properties: It exactly cancels the "where you are" part of how much area you can enclose, and because it is itself the area element of a circle, it automatically adds up to a constant. There are no longer any variable lengths in the integrand, only the angle between the tangent vector at the curve and the tangent vector at the corresponding point of the circle, and it is then obvious that the integral is maximized by always choosing the tangent vector of the curve parallel to the tangent vector of the circle -- which necessarily results in a circle.