Intuition on the curl formula
Units, curl, and circulation
First it is convenient to conceive of a vector field as the velocity field of a flow, $${\bf F} =(F_1,F_2,F_3)= \left({dx \over dt},\, {dy \over dt},\, {dz \over dt}\right)\,,$$ which has the dimensions of length/time. The curl then becomes $${\rm curl}\; {\bf F} = \left( {\partial \over \partial y} {dz \over dt} - {\partial \over \partial z} {dy \over dt},\, {\partial \over \partial z} {dx \over dt} - {\partial \over \partial x} {dz \over dt},\, {\partial \over \partial x} {dy \over dt} - {\partial \over \partial y} {dx \over dt} \right)$$ which has the dimensions of 1/time, the same as angular velocity. The so-called circulation (around an oriented surface $S$) is given by $$\hbox{circulation} = \int_{\partial S} {\bf F} \cdot dr$$ which has the dimensions of area/time and is a bit difficult to understand intuitively. (If ${\bf F}$ represent force, the integral represents net work around $S$, but in that case, the curl has the odd dimensions of mass/time${}^2$, which I am at a loss to explain.)
From Stokes' to circulation/area
One way to approach the idea of the curl is through Stokes' theorem, which says the circulation of vector field around a surface is equal to the flux of the curl across the surface: $$\int_{\partial S} {\bf F} \cdot dr = \iint_S {\rm curl}\; {\bf F} \cdot {\bf n}\;dS$$ where ${\bf n}$ is the surface normal. By the mean value theorem $$\iint_S {\rm curl}\; {\bf F} \cdot {\bf n} \;dS= {\rm curl}\; {\bf F} \cdot {\bf n}\iint_S dS = ({\rm curl}\; {\bf F} \cdot {\bf n})\;A(S) $$ where ${\rm curl}\; {\bf F} \cdot {\bf n}$ is evaluated at some point on $S$. And so $${\rm curl}\; {\bf F} \cdot {\bf n} = {\hbox{circulation} \over A(S)}$$ If $S$ is a small square normal to ${\bf n}$, the circulation will be maximized when ${\bf n}$ points in the direction of ${\rm curl}\; {\bf F}$.
From circulation/area to the formula for curl
One can compute the components of the curl by letting ${\bf n}$ be ${\bf i}, {\bf j}, {\bf k}$. We will show the calculation for ${\bf n} = {\bf k}$. Let $S$ be a square parallel with opposite corners $(x,y,z)$ and $(x+dx,y+dy,z)$. Along a side of $S$ the integral of ${\bf F} \cdot dr$ is $\pm F_1\,dx$ or $\pm F_2\,dy$, evaluated at some point on the side and the sign according with the orientation of the side. Opposite sides have opposite orientations. Combining all four terms, we can approximate the circulation by $$\begin{align} (F_2(x+dx,y^*,z)&-F_2(x,y^*,z))\;dy - (F_1(x^*,y+dy,z)-F_1(x^*,y,z))\;dx \\ &= {\partial \over \partial x} {dy \over dt} \;dx\;dy - {\partial \over \partial y} {dx \over dt} \;dx\;dy \end{align}$$ Thus since $A(S) = dx\;dy$, $${\rm curl}\; {\bf F} \cdot {\bf k} = { {\partial \over \partial x} {dy \over dt} \;dx\;dy - {\partial \over \partial y} {dx \over dt} \;dx\;dy \over dx\;dy} = {\partial \over \partial x} {dy \over dt} - {\partial \over \partial y} {dx \over dt}\,.$$ The equations become exact in the limit as $dx \rightarrow 0$ and $dy \rightarrow 0$, and a rigorous proof can be constructed with the aid of the mean value theorem. The other components may be derived similarly.
I suppose one normally does the reverse, that is, derive Stokes' theorem from the curl rather than the formula for curl from Stokes', but if you accept that Stokes' has been proved, then it shouldn't matter that we've used it to better understand the formula. Yet there is still more to say about how the formula is related to rotation.
The flow of the vector field and local instantaneous rotation
So if we again consider our square $S$ with normal ${\bf k}$ and we wish to consider the relative motions of two points at $(x,y,z)$ and at $(x+dx,y,z)$. We wish to understand the rotation of the second about an axis parallel to ${\bf k}$ through $(x,y,z)$. Clearly the $z$ component is irrelevant. Also the $x$ component is, because if $dx/dt=F_1(x,y,z)$ differs from $dx/dt=F_1(x+dx,y,z)$ the $x$ coordinates of the points are moving nearer or farther from each other and such a relative motions does not contribute to rotation. What is left is the $y$ component. The change in the $y$ coordinates of the two particle in a small time $dt$ is approximately $(F_2(x+dx,y,z)-F_2(x,y,z))\;dt$. Since the points are $dx$ apart, the angle turned is $$d\theta_1 \approx \tan d\theta_1 = {(F_2(x+dx,y,z)-F_2(x,y,z))\;dt \over dx} = {\partial \over \partial x} F_2 \; dt = {\partial \over \partial x} {dy \over dt} \; dt\,.$$ Similarly comparing the points $(x,y,z)$ and $(x,y+dy,z)$, we can show the angle turned is approximately $$d\theta_2 \approx - {\partial \over \partial y} {dx \over dt} \; dt\,.$$
One might wonder if the analysis above is a little too loose. One can see what's going on in this figure:
The points $(x,y,z)$ and $(x+dx,y,z)$ on the left flow in a given time $\Delta t$ to two points $(x',y',z')$ and $(x'+dx+\Delta x,y'+\Delta y,z'+\Delta z)$ on the right. The relative change in position is given by $$(\Delta x,\Delta y,\Delta z) \approx \left( {\partial F_1 \over \partial x}\;dx,\, {\partial F_2 \over \partial x}\;dx,\, {\partial F_3 \over \partial x}\;dx \right)\; \Delta t = \left( {\partial \over \partial x} {dx \over dt},\, {\partial \over \partial x} {dy \over dt},\, {\partial \over \partial x} {dz \over dt} \right) \;dx \;\Delta t$$ The amount rotation $\Delta\theta$ in the time $\Delta t$ about ${\bf k}$ is given by $$\tan \Delta\theta = {\Delta y \over dx + \Delta x}$$ As $\Delta t \rightarrow 0$, $dx$ is fixed; but $\Delta x \rightarrow 0$, $\Delta y \rightarrow 0$ and $\tan \Delta\theta \sim \Delta\theta \rightarrow 0$. Thus $${\Delta\theta \over \Delta t} \sim {\tan \Delta\theta \over \Delta t} \sim {\Delta y/\Delta t \over dx} \rightarrow {\partial \over \partial x} {dy \over dt}\,.$$ The partial derivative is evaluated at some point $(x^*,y,z)$ between $(x,y,z)$ and $(x+dx,y,z)$. As $dx \rightarrow 0$, $(x^*,y,z) \rightarrow (x,y,z)$, and the angular rate becomes the term in the formula for the curl.
Curl and the rate of rotation
We found the ${\bf k}$ component of the curl to be the sum of two rates of rotation $${\rm curl}\; {\bf F} \cdot {\bf k} ={d\theta_1 \over dt} + {d\theta_2 \over dt} = { \left({\partial \over \partial x} {dy \over dt} \right) + \left( - {\partial \over \partial y} {dx \over dt} \right)}\,.$$ The other components may be derived similarly. Cartesian coordinate axes may be chosen arbitrarily. At a point on a given surface with unit normal ${\bf n}$, we may choose the coordinate axes so that ${\bf k} = {\bf n}$ (with ${\bf i}, {\bf j}$ being whatever mutually orthogonal unit vector we please). Thus the component of the curl ${\rm curl}\; {\bf F} \cdot {\bf n}$ is the sum of two rates of rotation independent of the choice of ${\bf i}, {\bf j}$.
One way to think of it is that the curl is twice the average of two rates of rotation. Why twice? Why the average? Plausible arguments: (1) It turns out that way. :) (2) The bisector of the angle formed by $(x,y+dy,z)$--$(x,y,z)$--$(x+dx,y,z)$ rotates at the average rate. (3) The circulation/area has a factor of 2 in it when $S$ is a disk or square, when viewed in a certain way: Let $S$ be a disk or square centered at $(x,y,z)$ whose perimeter is a radius/distance $R$ from the center. If we write the circulation/area in the form $${\rm curl}\; {\bf F} \cdot {\bf n} = {\int_{\partial S} {\bf F} \cdot dr \over \hbox{area}}= {(\bar{{F}})(\hbox{perimeter}) \over \hbox{area}} = {(\bar {F})(2\pi R) \over \pi R^2} \,\buildrel {\rm or} \over = \, {(\bar {F})(8R) \over 4R^2}= {2\,\bar {F} \over R}\,,$$ then the normal component of the curl is twice the mean circulation $\bar {F}$ over the radius of $S$. So for instance if a disk turns at an angular rate $\omega$, the velocity at the perimeter is a constant $\omega R$, which also equals $\bar {F}$. The circulation is $(\omega R)\,(2\pi R)$, and therefore the circulation/area is $2 \omega$, that is, twice the average rate of rotation.
Curl and the carpool lane
Suppose we model the flow of traffic on a highway by a $C^2$ autonomous vector field, where each car represents a discrete particle of the flow. If (as a passenger) you keep your gaze fixed toward a car near you, then the absolute rate at which your head (or rather eye) turns will be proportional to the curl. By absolute rate, I mean that the angle is to be measured with respect to a fixed direction, such as due north.
Taken from my MathOverflow answer here.
Let $F = (F_1, F_2, F_3)$ denote a vector field in $\mathbb{R}^3$, and write $\text{curl}\ F = (G_1, G_2, G_3)$. We would like a situation where $G_1$ describes the "instantaneous" rotation of $F$ about the $x$-axis, $G_2$ the rotation about the $y$-axis, and $G_3$ the rotation about the $z$-axis.
So let's think of vector fields which do just that. Three simple (linear!) ones which come to mind are $$H_1(x,y,z) = (0, -z, y)$$ $$H_2(x,y,z) = (z, 0, -x)$$ $$H_3(x,y,z) = (-y, x, 0)$$ So in order to measure how much $F$ rotates about, say, the $z$-axis, it makes sense to look at something that compares how similar $F$ is to $H_3$. The dot product $F(x,y,z) \cdot H_3(x,y,z)$ seems reasonable, which is precisely $-yF_1(x,y,z) + xF_2(x,y,z).$
This suggests that defining $$G_1(x,y,z) \approx -zF_2(x,y,z) + yF_3(x,y,z)$$ $$G_2(x,y,z) \approx zF_1(x,y,z) - xF_3(x,y,z)$$ $$G_3(x,y,z) \approx -yF_1(x,y,z) + xF_2(x,y,z)$$ might give something close to what we want. But this is a very crude way to measure "instantaneous" rotation -- in fact, one might say it's a sort of linear approximation. Thus, we are led to replacing the linear terms with their corresponding derivations: $$G_1(x,y,z) = -\frac{\partial}{\partial z}F_2 + \frac{\partial}{\partial y}F_3$$ $$G_2(x,y,z) = \frac{\partial}{\partial z}F_1 - \frac{\partial}{\partial x}F_3$$ $$G_3(x,y,z) = -\frac{\partial}{\partial y}F_1 + \frac{\partial}{\partial x}F_2,$$ which is precisely the curl.
let $\mathbf{F}=(L(x,y),M(x,y))$
from Green's theorem
$\oint_{C} (\bf F\, \cdot d \bf r) = \iint_{D} \left(\frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}\right)\, dx\, dy $
now define $\operatorname{Curl}(F)$ = $\frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}$
now assume you want to find the curl formula for space.
from Green's theorem we know that the curl of gradient field must be zero . we will use that to get formula for curl in space
so let $f=f(x,y,z)$
$$\nabla f=\mathbf{F}=(\frac { \partial f }{ \partial x },\frac { \partial f }{ \partial y },\frac { \partial f }{ \partial z })$$
$\frac { \partial f }{ \partial x }=P$ , $\frac { \partial f }{ \partial y }=Q$ , $\frac { \partial f }{ \partial z }=R$
and since we can interchange the order of taking partial derivatives
$\frac { \partial^2 f }{ \partial x \partial y}=\frac { \partial^2 f }{ \partial y \partial x}$ $\Rightarrow$ $\frac { \partial P }{ \partial y }=\frac { \partial Q }{ \partial x }$
$\frac { \partial^2 f }{ \partial x \partial z}=\frac { \partial^2 f }{ \partial z \partial x}$ $\Rightarrow$ $\frac { \partial P }{ \partial z }=\frac { \partial R }{ \partial x }$
$\frac { \partial^2 f }{ \partial y \partial z}=\frac { \partial^2 f }{ \partial z \partial y}$ $\Rightarrow$ $\frac { \partial Q }{ \partial z }=\frac { \partial R }{ \partial y }$
so for any vector field $\mathbf{F}=(P(x,y,z),Q(x,y,z),R(x,y,z))$ to be gradient field, it must satisfy those conditions
$\frac { \partial Q }{ \partial z }=\frac { \partial R }{ \partial y }$
$\frac { \partial P }{ \partial z }=\frac { \partial R }{ \partial x }$
$\frac { \partial P }{ \partial y }=\frac { \partial Q }{ \partial x }$
we combine them in one formula so that when the vector field is gradient field ,thus satisfying the above conditions, the curl would be zero
$\operatorname{Curl}(F)$=($\frac { \partial Q }{ \partial z }-\frac { \partial R }{ \partial y }$)+($\frac { \partial P }{ \partial z }-\frac { \partial R }{ \partial x }$)+($\frac { \partial Q }{ \partial x }-\frac { \partial P }{ \partial y }$)
then I'll rewrite it in differently to make it easier to remember (in determinant form)
$\operatorname{Curl}(F)$=($\frac { \partial Q }{ \partial z }-\frac { \partial R }{ \partial y }$)-($\frac { \partial R }{ \partial x }-\frac { \partial P }{ \partial z }$)+($\frac { \partial Q }{ \partial x }-\frac { \partial P }{ \partial y }$)
now after we reached this step we notice that this almost exactly the same as the expansion of cross product
$\begin{align} \mathbf{u}\times\mathbf{v}\ =\begin{vmatrix} \mathbf{i}&\mathbf{j}&\mathbf{k}\\ u_1&u_2&u_3\\ v_1&v_2&v_3\\ \end{vmatrix} =&(u_2v_3-u_3v_2)\mathbf{i}-(u_1v_3-u_3v_1)\mathbf{j}+(u_1v_2-u_2v_1)\mathbf{k}\\ \end{align}$
let $\nabla=\mathbf{u}$ , $\mathbf{F}=\mathbf{v}$
$\begin{align} \nabla\times\mathbf{F}\ =\begin{vmatrix} \mathbf{i}&\mathbf{j}&\mathbf{k}\\ {\frac{\partial}{\partial x}}&{\frac{\partial}{\partial y}}&{\frac{\partial}{\partial z}}\\ P&Q&R\\ \end{vmatrix} =&(\frac { \partial R }{ \partial y }-\frac { \partial Q }{ \partial z })\mathbf{i}-(\frac { \partial R }{ \partial x }-\frac { \partial P }{ \partial z })\mathbf{j}+(\frac { \partial Q }{ \partial x }-\frac { \partial P }{ \partial y })\mathbf{k}\\ \end{align}$
which is precisely the curl.
$\operatorname{Curl}(F)=\nabla\times\mathbf{F}\ $
and as Hayden mentioned that doesn't mean that we take the cross product of any thing it's just abuse of the notation because it's convenient that way.
$\nabla\times \mathbf{F}$, where $\mathbf{F}=\langle F_1,F_2,F_3\rangle$, is a device to represent the curl formula derived in a geometric proof/explanation such as the one you linked:
$$ \text{curl}\,\mathbf{F} =\left\langle {\partial F_3\over \partial y}-{\partial F_2\over \partial z}, {\partial F_1\over \partial z}-{\partial F_3\over \partial x}, {\partial F_2\over \partial x}-{\partial F_1\over \partial y}\right\rangle. $$
We'd like a concise way to talk about the quantity on the right-hand side, so we notice its formula follows \begin{align} \left\langle {\partial F_3\over \partial y}-{\partial F_2\over \partial z}, {\partial F_1\over \partial z}-{\partial F_3\over \partial x}, {\partial F_2\over \partial x}-{\partial F_1\over \partial y}\right\rangle &=\begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k}\\ {\partial \over \partial x} & {\partial \over \partial y} & {\partial \over \partial z}\\ F_1 & F_2 & F_3 \end{vmatrix}\\ &=\left\langle {\partial \over \partial x} , {\partial \over \partial y} , {\partial \over \partial z}\right\rangle\times\langle F_1,F_2,F_3\rangle\\ &=\nabla\times\,\mathbf{F}, \end{align} in the sense that the quantity we have on the left-hand side equals the "determinant" in the middle, and this middle determinate looks like a cross product of partial derivatives with the scalar component functions, so we package it as such and call it $\text{curl}\,\mathbf{F}$.
Voila! Good meaningful mathematical notation (rooted in concept) is born.