How does contour integral work?
It might be a vague question but I can't help asking what is so powerful in contour integral that makes it possible to compute certain improper real integrals which are seemingly very difficult to compute by real variable calculus method.
The power of contour integration in the complex plane lies with the concept of an analytic function - it turns out that an integral over a closed contour in the complex plane of a function that is analytic in the interior of that contour is zero. I just recited one interpretation of Cauchy's theorem, which is the fundamental theorem of complex integration if there is one.
Cauchy's theorem lies at the heart of so much of complex analysis. But let's restrict our discussion to evaluating definite integrals of real-valued functions of real variables. Why is Cauchy's theorem useful? Because the theorem says very little about the exact shape of the contour - just that the function has no singularities (poles, branch points) on the contour. This allows us an extraordinary amount of freedom to express a real-valued integral as one piece of a closed contour.
Further, if the function we are integrating is not analytic inside the contour at discrete points - again, due to a pole or branch point - then we can deform the contour to exclude those points. In the case of a pole, the exclusions take on a magic geometry of their own. The result is the residue theorem, in which we can pretend that it is OK for a function to have isolated poles inside the contour, and that the contour integral is related to the sum of residues of poles inside the contour. In the case of branch points, things are not as simple, but usually we end up using Cauchy's theorem to express our nasty real-valued integral in terms of a different but simpler real-valued integral about the branch point.
The easiest examples given in standard courses in complex analysis involve semicircles in which the diameter of a semicircle is on the real axis. There is typically a pole inside the semicircle, and the integral over the circular part is typically zero when the radius is large enough. Thus, the integral over the real line is equal to a residue of a pole, which is very easy to compute.
However, we are not limited to such easy examples. As I said, mastery of this technique leads to a dizzying array of contours and even analytic functions that return the value of the desired integral with very little additional work. Please have a look throughout this site, especially in the contour-integration tag, to see the possibilities of what Cauchy's theorem is capable of.
$\DeclareMathOperator{\Re}{Re}\DeclareMathOperator{\Res}{Res}$In the prototypical situation, suppose $f$ is a meromorphic function on the complex plane, i.e., a function with isolated singularities that is holomorphic on the complement $U$ of its set of singular points.
Thanks to Cauchy's theorem (a.k.a., Green's theorem applied to the real and imaginary parts), the meromorphic $1$-form $f(z)\, dz$ satisfies a remarkable invariance property: If $\gamma_{1}$ and $\gamma_{2}$ are piecewise-smooth closed curves in $U$ (or more generally, $1$-chains, i.e., formal sums of such curves with integer coefficients) that have the same winding number around every singularity of $f$, then $$ \int_{\gamma_{1}} f(z)\, dz = \int_{\gamma_{2}} f(z)\, dz. $$ Particularly, if $\gamma$ encloses some finite set of singularities of $f(z)\, dz$, the integral of $f(z)\, dz$ around $\gamma$ is the sum of the corresponding residues of $f$ multiplied by the winding number of $\gamma$ about the singularity. In symbols, if $\gamma$ encloses singularities $(z_{j})_{j=1}^{k}$ with winding numbers $(n_{j})_{j=1}^{k}$, then $$ \int_{\gamma} f(z)\, dz = 2\pi i \sum_{j=1}^{k} n_{j} \Res(f, z_{k}). \tag{1} $$
The residue of a meromorphic function at an isolated singularity $z_{0}$ is the coefficient of $(z - z_{0})^{-1}$ in the Laurent expansion of $f$ about $z_{0}$. (This is the unique integer power of $(z - z_{0})$ whose integral over a small loop centered at $z_{0}$ is non-zero, precisely because this power admits no meromorphic primitive in a punctured neighborhood of $z_{0}$.)
With all this understood, contour integration works when some real integral can be converted to a piecewise-smooth closed path $\gamma$ in the complex plane (or on the Riemann sphere, or on some other Riemann surface, as when working with with elliptic functions or functions having branch cuts) in such a way that
The total residue of $f(z)\, dz$ about $\gamma$ can be evaluated;
The original integral is a tractable algebraic expression in the total residue.
In summary, complex integration allows you to "localize" a (global) contour integral to certain power series coefficients at the singularities of $f(z)\, dz$.
To illustrate, consider the integral $$ \int_{0}^{\infty} \frac{\sin x}{x}\, dx = \frac{1}{2} \int_{-\infty}^{\infty} \frac{\sin x}{x}\, dx $$ One approach is to write $\cos x - i\sin x = \exp(-ix)$, which leads one to consider the meromorphic function $f(z) = \exp(-iz)/z$. This function:
Has a pole at $0$, with $\Res(f, 0) = 1$, and an essential singularity at $\infty$;
Decays exponentially rapidly as $\Im z \to + \infty$.
One is led to consider, for $0 < \delta < R$ arbitrary, the "half-gasket" contours consisting of the segment from $\delta$ to $R$; the counterclockwise semi-circle from $R$ to $-R$ in the upper half-plane; the segment from $-R$ to $-\delta$, and the clockwise semi-circle from $-\delta$ to $\delta$ in the upper half-plane. (Exercise: Draw this contour on the Riemann sphere to see how it "skirts" the singularities of $f$.)
By the residue theorem (1), the integral over each such contour vanishes. Take limits as $\delta \to 0^{+}$ and $R \to +\infty$. The integrals over the segments approach the desired integral $$ \int_{0}^{\infty} \frac{\sin x}{x}\, dx; $$ the integral over the large semi-circle converges to $0$ because the integrand decays exponentially in the upper half-plane; the integral over the small semi-circle converges to $-\frac{1}{2} \Res(f, 0) = -\frac{1}{2}$. Putting the pieces together, the residue theorem gives $$ -\pi i = \int_{-\infty}^{\infty} \frac{e^{-ix}}{x}\, dx = \int_{-\infty}^{\infty} \frac{\cos x - i\sin x}{x}\, dx. $$ (Technically the improper integral diverges; the preceding equation really involves its principal value.) Equating the real and imaginary parts gives the expected evaluation $$ \pi = \frac{1}{2} \int_{-\infty}^{\infty} \frac{\sin x}{x}\, dx. $$
As mentioned in comments, the reason we can change/deform contours is due to a generalized version of Stoke's Theorem. Which has an extremely intuitive explanation here on Wikipedia, see underlying principles. Here is another here. The hand wavy reason it works is because the curl of a complex function is $0$.
Circles are really easy to parametrize. That's why we (generally) integrate about circles rather than..say triangles. This is due to Euler's Identity.
In addition, we're used to writing down power series for variables. However, in some cases, we sometimes need, and as you'll see, often times desire negative powers. This motivates the discussion of Laurent Series.
So when we integrate about a curve, $C(\theta)$, we multiply by a differential $dC(\theta)$ such that
$$(1) \quad dC(\theta)=C(\theta+d\theta)-C(\theta)$$
Multiplying and dividing the right hand side of $(1)$ by $d\theta$ yeilds,
$$(2) \quad dC(\theta)= \cfrac{dC}{d\theta} \cdot dt$$
So,
$$(3) \quad \int_C f(C(\theta)) \ dC(\theta)=\int f(C(\theta)) \cdot \cfrac{dC}{d\theta} \ d\theta$$
The meaning of $(3)$ is as follows. The Line Integral of a function $f(x)$ about a curve $C(\theta)$ gives the average value of $f \cdot \cfrac{dC}{|dC|}$ on that curve multiplied by the length of the curve.
If you accept this claim, we can simply substitute the functions in question. However that isn't really the problem is it?
First, let's address a major issue; why does $(3)$ average over $f \cdot \cfrac{dC}{|dC|}$?
It'll take some imagination but a reasonable explanation can be given. Imagine $f(t)$ gives the speed of a walking person at a particular time. If it's positive, (s)he's walking to the right. If it's negative, (s)he's walking to the left. Now imagine that we were viewing this on a TV rather than in real life. If we speed up time, fast forwarded, the person would appear to be walking faster. If we rewinded the person would switch directions. However, if we have a real time $p$ then we can know the time on the TV by making $t$ a function of the real time $p$. Now, if $dt$ is negative, we know that the time on the TV is going backward. However, if the person's speed is still positive, we know that the true speed, relative to the observer at time $p$, is actually negative. So if we want the average speed of the person, it's not enough to watch the TV, you also need to know how time is progressing. That's why $(3)$ can be negative. However, if time progresses normally relative to the observer, the ratio becomes unity.
So, in short $(3)$ averages, but does so with respect to direction. Sadly, explaining imaginary time, would take a lot of real time, so we'll have to settle for the above sentence rather than the time analogy.
However, recall that multiplication by a complex number $v$ rotates the multiplicand $w$ by $arg(v)$ degrees/radians and scales by $|v|$ in the complex plane. For a more in depth explanation and review see here.
So, $\cfrac{dC(\theta)}{|dC(\theta)|}$ gives the normed differential, the direction of the differential, at an angle $\theta$.
Imagine lines running from the origin to a point on $C$. Then imagine another line going from that point then tangent counter-clockwise along the curve. Move this new line and place it's tail at the origin. So, you should see a line representing $C(t)$, then you should also see another line representing the direction of $dC$. Notice that for the complex circle, $90^o$ seperates these lines. Since, multiplication can rotate, the direction of $dC$ is $C$ rotated by $90^o$. Better yet,
$(4) \quad dC=i \cdot C$
$\Rightarrow arg(dC)=arg(C)+\pi/2$
Where we've switched to radians. It's important to realize that it's the difference between the directions, or angles, that's key. In symbols,
$(5) \quad \Delta \theta=arg(dC)-arg(C)=\pi/2$
Which is constant. We know that,
$(6) \quad arg(v \cdot w)=arg(w)+arg(v)$
By extension,
$(7) \quad arg \left(\cfrac{w}{v} \right)=arg(w)-arg(v)$
Using the principle of correspondence yeilds,
$(8) \quad arg \left(\cfrac{dC}{C} \right)=arg(dC)-arg(C)$
Thus, if we want $(3)$ to be constant, in other words if we want the angle difference to be constant, we should have $f=\cfrac{1}{C(\theta)}$.
What about $f=\cfrac{1}{C(\theta)^2}$? It rotates faster than $dC$, so the difference between angles won't be constant.
What about $f=\cfrac{1}{C(\theta)^0}$? This time it rotates to slowly for the difference to be constant.
Finally, putting everything together,
$$(9) \quad \cfrac{1}{2 \cdot \pi} \cdot \int_{C_r} \cfrac{dC}{C-z_0}=i$$
This says that the average angle between $dC$ and $\cfrac{1}{C-z_0}$ in the complex plane is $\pi/2$ radians or $90^o$ degrees. Which in this case is best represented as $i$.
It is because analytic functions have very nice properties and analyticity of a complex valued function is more restrictive than differentiability of real real valued functions. So we are using a more powerful tool.
The Cauchy-Riemman equations limit the possibilities of how the function $u(x,y)+iv(x,y)$ can behave:
$$\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}\\ \phantom{whitespace}\\\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}$$
There is not much as much freedom left for the function if these are required to hold true than if only differentiability was required.