Why vector calculus seems inconsistent and vague

In what follows I'll admit that you feel alright with $\mathbb{R}^n$, the space of $n$ dimensions and euclidean. You have many questions that really arises naturally when studying multivariable calculus on some books. Since your question is a little general I'll try to just talk a little about some of those things and then I'll recommend books.

Well, the fact that the gradient gives the direction of greater rate of change is easily proved by noting that if $f : \mathbb{R}^n \to \mathbb{R}$ is a smooth function and $v \in \mathbb{R}^n$ is a unity vector, then we have

$$\nabla f (p) \cdot v = \left \|\nabla f(p) \right \| \cos(\theta)$$

Where $\theta$ is the angle between $\nabla f(p)$ and $v$. Since $|\cos(\theta)|\leq 1$ the maximum of the quantity above is obtained setting $\theta = 0$, in other words, the maximum of the directional derivative is in the direction of the gradient.

But this is just one of the points that confuses you. Looking at your question, I believe that you also ask yourself: why this important vector called gradient is composed by the partials of the function? And this, is a simple result of the real definition of differentiability. A function $f : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at a point if it has linear approximation at the point. In other words, $f$ is differentiable at the point if there's a linear function $Df(p) : \mathbb{R}^n \to \mathbb{R}^m$ such that the amount $f$ increases when you walk along a vector $v$ can be approximated by the value of $Df(p)$ on $v$ with small error. With this definition you can prove that the gradient must be what it is.

Divergence and Curl interpretations can be understood by application of some theorems from integral calculus of several variables. However, to really get the most of this it's probably better to deal with objects called differential forms. Some might say: "my god, this is too sofisticated to talk about in this questions", however, it's the real good way to understand the intricacies of those things. It's not just a coincidence that the fundamental theorems (Gradient' theorem, Stokes' theorem and Gauss' theorem) look so closely related.

Probably the best for you is to get started by Apostol's Calculus Volume 2. This book will show why the definition of derivative as a linear transformation is a good definition, it will show you why the gradient is how it is, it will prove all the important theorems and it'll develop curl and divergence in a systematic way. Once you feel confortable with this kind of approach, you can attack Spivak's Calculus on Manifolds. Very rigorous indeed, however, it's approach reveals the intricacies that exists. Also, there's another book called "Vector Calculus" by "Jerrold E. Marsden", this one mix intuition and rigor, in other words he try to make things really make sense to the reader and also try to get things proved.

The main point is: vector calculus is not incosistent. Using books that tend to forget about rigor can be "good" in the point of being easier, however, they'll hide the true nature of things. There are reasons for all of those things you ask and they are shown in books like Apostol's and Spivak's.

Of course I spoke in general about your question. If you want something more detailed about one point, and if it fits in this question, tell me in comment and I'll add an edit. If not, ask again more specifically here in Math.SE.

I hope this helps you somehow. Good luck in your studies!

EDIT: When I told about getting an interpretation of curl and div using theorems from integral calculus I meant using both the corresponding fundamental theorem and the mean value theorem for integrals. I'll talk about it a little loosely, just to explain what my point is. We'll tackle divergence in $\mathbb{R}^3$. The point is, take a vector field $F : \mathbb{R}^3 \to \mathbb{R}^3$ and take some $3$-dimensional region $\mathcal{B}$. Gauss' theorem tells you that:

$$\iiint_{\mathcal{B}}{\operatorname{div}(F)dV} = \iint_{\partial \mathcal{B}}{F\cdot n dA}$$

Where $\partial \mathcal{B}$ is the boundary of $\mathcal{B}$ and $n$ is the unit normal to $\partial \mathcal{B}$. The point is that the mean value theorem states that there's a point $(x_0, y_0, z_0) \in \mathcal{B}$ such that the LHS can be written as:

$$\iiint_{\mathcal{B}}{\operatorname{div}(F)dV} = \operatorname{div}(F)(x_0, y_0, z_0) V(\mathcal{B})$$

The point is that given then some arbitrary point $(x_0, y_0, z_0)$ we can get a collection of regions $\mathcal{B}$ that encloses the point and that get smaller and smaller, so that in the limit when the volume of those regions goes to zero we get the above equallity. So in the end we can see that:

$$\operatorname{div}(F)(x_0, y_0, z_0) = \lim_{V(\mathcal{B}) \to 0} \frac{1}{V(\mathcal{B})}\iint_{\partial \mathcal{B}} F \cdot n dA$$

This equation is what you can find on wikipedia's page about divergence. Look that this is saying that the divergence is the flux of the field across the boundary of a region when this region shrinks to the point $(x_0, y_0, z_0)$, so that the divergence can be thought of as a measure of "local" flux.


Div, Grad, Curl, and All That is fantastic for developing your intuition about vector calculus without getting too caught up in mechanics of difficult calculations.


Question 1: Why in 2D curl subtracting the partial derivatives from each other give amount of rotation? $\newcommand{\v}[1]{\boldsymbol{#1}}$

Answer(Short): 2D curl is just 3D curl embedded in 2D.

Answer(Long): The reason why 3D curl measures the rotation is because of its definition. If we want to measure how "rotational" or "solenoidal" a vector field is, with respect to an infinitesimal surface, we could simply compute the line integral around the boundary of the surface. Like the picture in the Wikipedia entry of curl. And it happens that by the Stokes theorem, we can define a curl operator $\nabla \times \cdot$ on this surface (I use integral instead of double and triple integral sign for all integrals): $$ \int_{\Sigma} \nabla \times \v{u}\cdot d\v{S} = \int_{\Sigma} \nabla \times \v{u}\cdot \v{\nu}\,dS = \oint_{\partial \Sigma} \v{u}\cdot d\v{r} = \oint_{\partial \Sigma} \v{u}\cdot \v{\tau}\,dr $$ where $\v{\nu}$ is the unit normal vector to the surface, and $\v{\tau}$ is the tangential vector along the boundary of $\Sigma$. Then by integration by parts formula shows actually we could define curl operator for smooth vector fields within a smooth domain $\Omega$ in 3D: $$ \int_{\Omega} \nabla \times \v{u} \cdot \v{v} = \int_{\Omega} \nabla \times \v{v} \cdot \v{u} - \int_{\partial \Omega} \v{u}\times \v{\nu}\cdot \v{v} \,dS $$ The formula of curl is obtain by its definition along with fundamental theorem of calculus in arbitrary dimension domain with a Cartesian coordinate system: $$ \int_{\Omega} \partial_{x_i} u\, dx = \int_{\partial \Omega} u \,\nu_i dS \,, $$ Then we found that the curl formula in 3D for $\v{v} = \langle v_1,v_2,v_3\rangle$ happens to be: $$ \nabla \times \v{u} = \begin{vmatrix} \v{i} & \v{j} & \v{k} \\ \\ {\frac{\partial}{\partial x}} & {\frac{\partial}{\partial y}} & {\frac{\partial}{\partial z}} \\ \\ v_1 & v_2 & v_3\end{vmatrix} \tag{1}$$

Now back to the question, 2D curl for a vector field in 2D can be viewed as "a 3D curl for a vector field in 3D but without a third component". You could just think the vector field lives on the $xy$-plane in 3D, and being embeded into $\mathbb{R}^3$: let $\v{v} = \langle v_1(x,y),v_2(x,y) \rangle$, embed it into 3D by letting $\widehat{\v{v}} = \langle v_1,v_2,0\rangle$, then apply the cross product formula (1), what we get would be

$$\nabla \times \widehat{\v{v}} = \left\langle 0,0,\left|\begin{matrix}\frac{\partial}{\partial x}&\frac{\partial}{\partial y}\newline v_1&v_2 \end{matrix}\right|\right\rangle $$ and this vector dot product with the unit normal vector to the $xy$-plane $\langle 0,0,1\rangle$, what you get would be the curl for a 2D vector field: $$\nabla \times \v{v} = \nabla \times \widehat{\v{v}} \cdot \v{\nu}_{xy-\text{plane}} = \frac{\partial v_2}{\partial x} - \frac{\partial v_1}{\partial y}$$

There is another kind of curl in 2D, namely vector-$\mathbf{curl}$, or rotation operator $\mathbf{rot}$. Here we embed a scalar function into the third dimension of $\mathbb{R}^3$: $u = \langle0,0,u\rangle$, and what we get is the rotation of the $\nabla$-operator sometimes denoted by $\nabla^{\perp}$: $$ \mathbf{rot} u = \nabla^{\perp} u = \left\langle\frac{\partial u}{\partial y},-\frac{\partial u}{\partial x},0 \right\rangle $$ This operator gives rise of a rotational vector field by rotating the gradient field of a scalar function counterclockwisely by $\pi/2$.

Hopefully this clears up the confusion for Question 3 also.


Question 2: Why does adding up the partial derivatives in $i , j , k$ directions in gradient gives the fastest increase of a function and from where did this came?

Answer: I do not know what are you talking about. "adding up the partial derivatives in $i , j , k$ directions" is the divergence operator $\nabla \cdot$. This operator could have several interpretation, the first one is that:

Divergence measures the outgoing flux of a vector field across a domain's boundary.

This view basically takes advantage of the divergence theorem: $$ \int_{\Omega} \nabla \cdot \v{u} = \int_{\partial \Omega} \v{u}\cdot d\v{S} = \int_{\partial \Omega} \v{u}\cdot \v{\nu}\,dS $$ On the boundary, a vector field $\v{v}$ dot product with the outward unit normal vector $\v{\nu}$ on the boundary measures how much this vector field flows out across the across the boundary, I just draw following two pictures using MATLAB: gradient field

curl field

First one is the gradient field of $u = e^{x^2+y^2}$, and it is parallel to the outward unit normal of the unit circle at every point, it is "flowing out". Second picture is the first vector field's rotation, $\mathbf{rot} u$, and its divergence is zero. Why? On the boundary of the circle, nothing is "flowing out" because this vector field is perpendicular to the outward unit normal at every point on the boundary. So this interpretation basically associates $\nabla \cdot \v{v}$ with $\v{v}\cdot \v{\nu}$ on the boundary.


Talking about the "fastest increase of a function", you must mean gradient operator itself. The idea of directional derivative proves to be very handy in explaining this:

The derivational derivative of a function $\nabla u\cdot \v{t}|_{P}$ measures the change rate of this function along $\v{t}$'s direction at point $P$.

Then it becomes obvious that among the change rates of this function along every direction, the change rate along the direction that is parallel to $\nabla u$ is the maximum change rate. Therefore, $\nabla u$'s direction is the "fastest changing" direction for $u$.


Last but not least, vector calculus itself is not inconsistent and vague at all, maybe it is the presentation of the textbook you are using. For undergrad vector calculus, the formulae are given without showing you a broader perspective of the vector calculus. If you happens to pursue further graduate study in mathematics, and learn differential geometry and de Rham cohomology, you will find that everything you learn in vector calculus is so consistent and natural.


EDIT:

About divergence, curl, gradient in 2D: in my humble opinion, the best way to understand vector calculus is from differential geometry and exterior calculus. So here I try my best to briefly answer why there is two "curl"s in 2D which seems inconsistent, borrowing some technicalities from de Rham. In any dimension, these differential operator can all be viewed as exterior differential on differential forms.

In 3D, the following chain complex is exact: $$ \Lambda^0 \xrightarrow{\nabla} \Lambda^1 \xrightarrow{\nabla\times} \Lambda^2 \xrightarrow{\nabla\cdot} \Lambda^3 $$ $\Lambda^0$ contains 0-form, scalar-valued functions, defined pointwise (electric potential). $\Lambda^1$ contains 1-form, vector-valued functions, defined using line integral. (electric field) $\Lambda^2$ contains 2-form, vector-valued functions, defined using surface integral. (electric flux) $\Lambda^3$ contains 3-form, scalar-valued functions, defined using volume integral. (charge density)

Everything is good in 3D because of the magic identity $3-1 = 1+1$, the adjoint of curl is curl itself (except formally you have to add a Hodge star to make it the volume form, thus to make an inner product). In 3D, every $k$-form can be associated with a geometric object: point->line->surface->volume.

However, in 2D, the magic identity is no longer valid, we have too few geometry to play with: point->line->area. Point and area are associated with scalar functions. There leaves only one slot for the vector field. There are two chain complexes in 2D: $$ \Lambda^0 \xrightarrow{\nabla} \Lambda^1 \xrightarrow{\nabla\times} \Lambda^2 $$ $$ \Lambda^0 \xrightarrow{\nabla^{\perp}} \Lambda^1 \xrightarrow{\nabla\cdot} \Lambda^2 $$ The second one is just the first one's rotation. $\nabla\times$ is $\nabla \cdot$'s rotation as $\nabla^{\perp}$ is $\nabla$'s rotation.

About fast changing direction of a function: $\nabla \phi$ being the fastest changing direction can be interpreted as, if we put a charged particle in this vector field $\nabla \phi$, which is the electric field of a static charge. Then the particle will move along field lines of $\nabla \phi$, rather than across the field lines.

As of $\mathbf{rot}=\nabla^{\perp}$: The introduction of this operator mostly comes from the study of the flow of incompressible fluids. It is not associated with the fastest changing direction at all. If we have to put an analogy with $\nabla$, then it is if we put a particle in the water flow, whose flow field is $\nabla^{\perp}\phi$ (water is incompressible, the divergence of the flow field has to be zero). The particle will flow along the flow lines of $\nabla^{\perp}\phi$.


The curl and rot are special to dimension $3$. It is easier to learn calculus and linear algebra in $n$ variables and later add the special 3-d.

In higher dimension it takes $\frac{n(n-1)}{2}$ parameters to name a rotation around a point. The rotation can be described using an $n$ dimensional vector only in case $n= \frac{n(n-1)}{2}$, which is $n=3$.

Part of the feeling of inconsistency you feel may be that there are hidden topological ideas in the background that are not made clear in the books that focus on calculation.