An Introduction to Tensors
At least to me, it is helpful to think in terms of bases. (I'll only be talking about tensor products of finite-dimensional vector spaces here.) This makes the universal mapping property that Zach Conn talks about a bit less abstract (in fact, almost trivial).
First recall that if $L: V \to U$ is a linear map, then $L$ is completely determined by what it does to a basis $\{ e_i \}$ for $V$: $$L(x)=L\left( \sum_i x_i e_i \right) = \sum_i x_i L(e_i).$$ (The coefficients of $L(e_i)$ in a basis for $U$ give the $i$th column in the matrix for $L$ with respect to the given bases.)
Tensors come into the picture when one studies multilinear maps. If $B: V \times W \to U$ is a bilinear map, then $B$ is completely determined by the values $B(e_i,f_j)$ where $\{ e_i \}$ is a basis for $V$ and $\{ f_j \}$ is a basis for $W$: $$B(x,y) = B\left( \sum_i x_i e_i,\sum_j y_j f_j \right) = \sum_i \sum_j x_i y_j B(e_i,f_j).$$ For simplicity, consider the particular case when $U=\mathbf{R}$; then the values $B(e_i,f_j)$ make up a set of $N=mn$ real numbers (where $m$ and $n$ are the dimensions of $V$ and $W$), and these numbers are all that we need to keep track of in order to know everything about the bilinear map $B:V \times W \to \mathbf{R}$.
Notice that in order to compute $B(x,y)$ we don't really need to know the individual vectors $x$ and $y$, but rather the $N=mn$ numbers $\{ x_i y_j \}$. Another pair of vectors $v$ and $w$ with $v_i w_j = x_i y_j$ for all $i$ and $j$ will satisfy $B(v,w)=B(x,y)$.
This leads to the idea of splitting the computation of $B(x,y)$ into two stages. Take an $N$-dimensional vector space $T$ (they're all isomorphic so it doesn't matter which one we take) with a basis $(g_1,\dots,g_N)$. Given $x=\sum x_i e_i$ and $y=\sum y_j f_j$, first form the vector in $T$ whose coordinates with respect to the basis $\{ g_k \}$ are given by the column vector $$(x_1 y_1,\dots,x_1 y_m,x_2 y_1,\dots,x_2 y_m,\dots,x_n y_1,\dots,x_n y_m)^T.$$ Then run this vector through the linear map $\tilde{B}:T\to\mathbf{R}$ whose matrix is the row vector $$(B_{11},\dots,B_{1m},B_{21},\dots,B_{2m},\dots,B_{n1},\dots,B_{nm}),$$ where $B_{ij}=B(e_i,f_j)$. This gives, by construction, $\sum\sum B_{ij} x_i y_j=B(x,y)$.
We'll call the space $T$ the tensor product of the vector spaces $V$ and $W$ and denote it by $T=V \otimes W$; it is “uniquely defined up to isomorphism”, and its elements are called tensors. The vector in $T$ that we formed from $x\in V$ and $y\in W$ in the first stage above will be denoted $x \otimes y$; it's a “bilinear mixture” of $x$ and $y$ which doesn't allow us to reconstruct $x$ and $y$ individually, but still contains exactly all the information needed in order to compute $B(x,y)$ for any bilinear map $B$; we have $B(x,y)=\tilde{B}(x \otimes y)$. This is the “universal property”; any bilinear map $B$ from $V \times W$ can be computed by taking a “detour” through $T$, and this detour is unique, since the map $\tilde{B}$ is constructed uniquely from the values $B(e_i,f_j)$.
To tidy this up, one would like to make sure that the definition is basis-independent. One way is to check that everything transforms properly under changes of bases. Another way is to do the construction by forming a much bigger space and taking a quotient with respect to suitable relations (without ever mentioning bases). Then, by untangling definitions, one can for example show that a bilinear map $B:V \times W \to \mathbf{R}$ can be canonically identified with an element of the space $V^* \otimes W^*$, and dually an element of $V \otimes W$ can be identified with a bilinear map $V^* \times W^* \to \mathbf{R}$. Yet other authors find this a convenient starting point, so that they instead define $V \otimes W$ to be the space of bilinear maps $V^* \times W^* \to \mathbf{R}$. So it's no wonder that one can become a little confused when trying to compare different definitions...
In mathematics, tensors are one of the first objects encountered which cannot be fully understood without their accompanying universal mapping property.
Before talking about tensors, one needs to talk about the tensor product of vector spaces. You are probably already familiar with the direct sum of vector spaces. This is an addition operation on spaces. The tensor product provides a multiplication operation on vector spaces.
The key feature of the tensor product is that it replaces bilinear maps on a cartesian product of vector spaces with linear maps on the tensor product of the two spaces. In essence, if $V,W$ are vector spaces, there is a bijective correspondence between the set of bilinear maps on $V\times W$ (to any target space) and the set of linear maps on $V\otimes W$ (the tensor product of $V$ and $W$).
This can be phrased in terms of a universal mapping property. Given vector spaces $V,W$, a tensor product $V\otimes W$ of $V$ and $W$ is a space together with a map $\otimes : V\times W \rightarrow V\otimes W$ such that for any vector space $X$ and any bilinear map $f : V\times W \rightarrow X$ there exists a unique linear map $\tilde{f} : V\otimes W \rightarrow X$ such that $f = \tilde{f}\circ \otimes$. In other words, every bilinear map on the cartesian product factors uniquely through the tensor product.
It can be shown using a basic argument that the tensor product is unique up to isomorphism, so you can speak of "the" tensor product of two spaces rather than "a" tensor product, as I did in the previous paragraph.
A tensor is just an element of a tensor product.
One must show that such a tensor product exists. The standard construction is to take the free vector space over $V\times W$ and introduce various bilinearity relations. See my link at the bottom for an article that does this explicitly. In my experience, however, the key is to be able to use the above mapping property; the particular construction doesn't matter much in the long run. The map $\otimes : V\times W \rightarrow V\otimes W$ sends the pair $(v,w) \in V\times W$ to $v\otimes w \in V\otimes W$. The image of $\otimes$ is the space of so-called elementary tensors, but a general element of $V\otimes W$ is not an elementary tensor but rather a linear combination of elementary tensors. (In fact, due to bilinearity, it is enough to say that a general tensor is a sum of elementary tensors with the coefficients all being 1.)
The most generic reason why tensors are useful is that the tensor product is a machine for replacing bilinear maps with linear ones. In much of mathematics and physics, one seeks to find linear approximations to things; tensors can be seen as one tool for this, although exactly how they accomplish it is less clear than many other tools in the same vein. Here are some more specific reasons why they are useful.
For finite-dimensional spaces $V,W$, the tensor product $V^*\otimes W$ is isomorphic to the space of homomorphisms $\text{Hom}(V,W)$. So in other words every linear map $V \rightarrow W$ has a tensor expansion, i.e., a representation as a tensor in $V^* \otimes W$. For instance, if $\{v_i\}$ is a basis of $V$ and $\{x_i\}$ is the dual basis of $V^*$, then $\sum x_i \otimes v_i \in V^* \otimes V$ is a tensor representation of the identity map on $V$.
Tensor products tend to appear in a lot of unexpected places. For instance, in analyzing the linear representations of a finite group, once the irreducible representations are known it can be of benefit to construct also a "tensor product table" which decomposes the tensor products of all pairs of irreducible representations as direct sums of irreducible representations.
In physics, one often talks about a rank $n$ tensor being an assembly of numbers which transform in a certain way under change of coordinates. What one is really describing here is all the different coordinate representations of an abstract tensor in a tensor power $V^{\otimes n}$.
If one takes the direct sum of all tensor powers of a vector space $V$, one obtains the tensor algebra over $V$. In other words, the tensor algebra is the construction $k\oplus V\oplus (V\otimes V) \oplus (V\otimes V\otimes V) \oplus \dots$, where $k$ is the base field. The tensor algebra is naturally graded, and it admits several extremely useful quotient algebras, including the well-known exterior algebra of $V$. The exterior algebra provides the natural machinery for differential forms in differential geometry.
Here's an example of the exterior algebra in practice. Suppose one wishes to classify all nonabelian two-dimensional Lie algebras $\mathfrak{g}$. The Lie bracket $[\cdot,\cdot]$ is antisymmetric and bilinear, so the machinery of tensor products turns it into a linear map $\bigwedge^2 V \rightarrow V$, where $V$ is the underlying vector space of the algebra. Now $\bigwedge^2 V$ is one-dimensional and since the algebra is nonabelian the Lie bracket is not everywhere zero; hence as a linear map the Lie bracket has a one-dimensional image. Then one can choose a basis $\{X,Y\}$ of $V$ such that $[X,Y] = X$, and we conclude that there is essentially only one nonabelian Lie algebra structure on a two-dimensional vector space.
A fantastic reference on tensor products of modules was written by Keith Conrad: http://www.math.uconn.edu/~kconrad/blurbs/linmultialg/tensorprod.pdf