Covering spaces naturally occur in the study of analytic continuation (in fact I believe this is where they first appeared). For example, the square root function $\sqrt{z}$ cannot be extended to a holomorphic function on all of $\mathbb{C}$. It can be locally defined on various open sets in $\mathbb{C} - \{ 0 \}$, and by analytic continuation it can be defined, for example, starting from a neighborhood of $z = 1$ (so that $\sqrt{z} = 1$ for example) and counterclockwise around the origin. However, this process is inconsistent: when you get back to $z = 1$ you'll find that $\sqrt{z} = -1$.

The solution is to define $\sqrt{z}$ on a double cover of $\mathbb{C} - \{ 0 \}$; there are two sheets of the cover for each of the two possible values of the square root. Similarly, $\sqrt[n]{z}$ is defined on an $n$-sheeted cover of $\mathbb{C} - \{ 0 \}$, and $\log z$ is defined on a cover of $\mathbb{C} - \{ 0 \}$ with infinitely many sheets.

There is a nice analogy with the theory of field extensions and Galois theory; the double cover mentioned above corresponds in some sense to the field extension $\mathbb{C}(z, \sqrt{z})$ of the field $\mathbb{C}(z)$, and it has Galois group $\mathbb{Z}/2\mathbb{Z}$. In this analogy the fundamental group of a space is analogous to the absolute Galois group of a field, and this has been a very fruitful analogy in mathematics, leading to the theory of the étale fundamental group. These ideas are thoroughly explored in Szamuely's Galois Groups and Fundamental Groups.

In quantum mechanics, covering spaces of topological groups naturally occur for the following reason. Because a wave function $\psi$ and any multiple $e^{i \theta} \psi$ of it represent the same physical state, to say that a group $G$ acts as a group of symmetries of a quantum system whose states lie in a Hilbert space $H$ is not to say that there is a representation $G \to \text{U}(H)$ (the unitary group of $H$) but rather a projective representation $G \to \text{PU}(H)$. For Lie groups $G$ such representations can be analyzed using representations of a covering group $\tilde{G}$ of $G$. This is why quantum systems with $\text{SO}(3)$-symmetry, e.g. an electron orbiting a proton, are naturally analyzed using the representation theory of $\text{SU}(2)$.


The definition probably only seems fiddly if you haven't seen it (or related) definitions before. What is says is the following: a map $p: Y \to X$ is a covering map if $p$ locally looks like the projection from $$X \times \text{ a discrete space} \to X.$$

A little more precisely: each point $x \in X$ has a neighbourhood $U$ such that the map $$p^{-1}(U) \to U$$ is isomorphic to a projection $$U \times \text{ a discrete space} \to U.$$

This kind of property of a map --- that locally on the target it looks like the projection from a certain kind of product --- is very common in topology and geometry, and underlies the fundamental notion of a fibre bundle. Covering spaces are perhaps the simplest example, since they are fibre bundles with discrete fibres. Fibre bundles of all kinds appear everywhere, and so it is not so much a question of asking what they are useful for, but rather, of identifying a ubiquitous property and giving it a name.

Another, more global, way to describe covering spaces is as follows: if a discrete group $\Gamma$ acts on a space $Y$ in such a way that each point has a neighbourhood $U$ such that the orbits $\gamma U$ are distinct for distinct elements $\gamma \in \Gamma$, then the quotient map $Y \to Y/\Gamma$ is a covering map (i.e. $Y$ is a covering space of $Y/\Gamma$).

Since group actions on spaces are pretty ubiquitous, this gives some indiction of why covering maps might be commonly encountered in topology. (The basic example is $\Gamma = \mathbb Z$ acting by translation on $Y = \mathbb R$, with the cover being $\mathbb R/\mathbb Z$, which is a circle.)

Finally, if you begin with space $X$, in order to construct covers of $X$, you have to "unwind" certain directions in $X$. Thus investigating covering spaces of $X$ is the same as investigating the extent to which the various directions in $X$ are "wound up".

E.g. in the circle there is just one direction, and unwinding it, you get the covering space $\mathbb R$. In $SO(3)$ there is one direction which is wound up, and unwinding it gives $SU(2)$. Often this "unwinding" can be thought of in a physical way: e.g. imagine that you are walking around a stadium, and measuring the distance you have travelled as you walk. When you get all the way around, you are back where you are started (the stadium is a circle), but your distance travelled isn't at zero (it's at 400 metres, say). The numerical distance travelled "unwinds" the circle of the stadium into the line $\mathbb R$.

E.g. in $SO(3)$, the "belt trick" mentioned by Georges allows you to "unwind" a rotation to get an element of $SU(2)$. (And when you do it twice, you get back where you started --- unlike in the case of the stadium, where your distance travelled never resets to zero; so here you see the difference between a double cover, like $SU(2)$ over $SO(3)$, and an infinite cover, like $\mathbb R$ over $\mathbb Z$.)


First of all, your technical questions, such as why the number of sheets (the cardinality of the fiber) is constant (for a connected covering space, of course) are answered in any algebraic topology text. I'll point you to Hatcher's text, available here because one, it is free, and two, it includes some less formal discussion that you might find illuminating. (EDIT: Having looked at the relevant section in Hatcher, he only asserts that the number of sheets is locally constant. It follows from the requirement that about any point, there is a neighborhood that is evenly covered, as this same neighborhood will work for nearby points, giving locally constant and hence globally constant if your space is connected.)

Your question is a big one, and the answers already posted give different insights into why covering spaces are as important a notion as they are. To expand on Lopsy's answer a bit, I think it is worth stressing that covering spaces and fundamental groups are intimately linked; this is one basic reason for their ubiquity. An essential fact about covering spaces is that for nice enough spaces $X$, the various connected covering spaces $\tilde{X}$ for $X$ are in one-one correspondence with subgroups of the fundamental group $\pi_1(X)$. That is, for each such subgroup $H \le \pi_1(X)$, there is a covering space $\tilde{X}$ with $\pi_1(\tilde{X}) = H$. Algebraic concepts like the index of a subgroup, or whether or not $H$ is normal, have elegant interpretations in the covering space setting: the index equals the number of sheets, and if the subgroup is normal, then the covering space is "homogeneous", loosely speaking (see Hatcher for precise statement + compelling pictures). So one answer to your question of why they are more useful than more general covers is that they provide exactly the right setting for realizing spaces with fundamental groups that are subgroups of your original space.

So why should we care about this? Again, this is a huge question. One answer is that it provides a wealth of group actions, which will allow us to study the structure of groups by understanding properties of the actions that they have on spaces. If we are interested in a group $G$ and we happen to have a space $X$ with fundamental group $G$, then $G$ acts on the universal cover $\tilde{X}$ of $X$ (which is just the simply-connected covering space for $X$) by deck transformations. A deck transformation is a map $\tilde{X} \rightarrow \tilde{X}$ that preserves the fibers. An example would be as follows: as Lopsy said, the universal cover of $S^1$ is $\mathbb{R}$, and $\pi_1(S^1) = \mathbb{Z}$. The action of $\mathbb{Z}$ on $\mathbb{R}$ is just translation; i.e. $n \in \mathbb{Z}$ acts by sending $x$ to $x+n$. Since the fiber of a point $x \in S^1$ is the set $\{x + n, n \in \mathbb{Z}\}$, we can see that only translation by an integer will preserve this set. Using this setting, you can prove, for instance, that any subgroup of a free group is free: thus you can derive purely algebraic facts by appealing to covering space theory.