*understanding* covariance vs. contravariance & raising / lowering

There are lots of articles, all over the place about the distinction between covariant vectors and contravariant vectors - after struggling through many of them, I think I'm starting to get the idea. I'm wondering if some-one/people can help me understand the meaning / significance of it.

My background is physics, with no training in 'abstract algebra' per se. The language I use below is basically as technical as I understand (sorry...).


My understanding:
Contravariant vectors (like normal displacement/velocity/etc vectors---with upper indices: $\vec{v} = v^i \, \bf{e}_i$, on a basis $\vec{\bf{e}} = \bf{e}_i$) transform like, $$v^{\prime j} = \frac{dx^{\prime j}}{dx^i} v^i.$$

Covariant vectors (like gradient vectors–with lower indices: $\vec{w} = w_i \, \bf{e}^i $, on a basis $\vec{\bf{e}} = \bf{e}^i$) transform like, $$w_{\prime j} = \frac{dx^i}{dx^{\prime j}} w_i.$$

We also have a metric $g_{ij}$ or $g^{ij}$ which can transform a contravariant vector to a covariant vector.


Questions:
If these vectors exist in the same 'space' (vector space?), and we want to make them interact---i.e. take the dot-product between them (which requires one to be upper-index and the other lower-index), then why are they being expressed with difference bases (basises?)?---doesn't that mean they're in different reference frames?

What is the meaning behind changing a vector from covariant to contravariant? The components change in some way, but the 'meaning' of the vector is supposed to stay the same, right?

Does something being a contravariant vector simply mean it is being defined with respect to a basis of tangent vectors; while a covariant vector is one in reference to a basis of normal vectors?
[this is my interpretation of the first figure of http://en.wikipedia.org/wiki/Covariance_and_contravariance_of_vectors ]


Solution 1:

What makes the typical physics explanation of differential geometry so confusing, is that it tends to be so coordinate based that it's hard to grasp that most of the objects do not depend on a coordinate system. From mathematics, I'm more used to refering to the vectors, not as contra- and covariant vectors, but as tangent vectors and differentials (or cotangent vectors).

Let's imagine our manifold is the surface of the Earth. We have a nice map of it in an atlas with longitudes along the x-axis and latitudes along the y-axis. Let's use the coordinates $x^{\mu}$ where $\mu\in\{\text{long},\text{lat}\}$, but remember these only serve to tell us where on the map a certain position on Earth is.

Now, let's say we go for a walk. We time our walk (in seconds) and at any time we are at some point $x(t)$ on Earth. We can describe these in coordinates as $x^\mu(t)$, and at any time we may give our speed as $\dot x^\mu=dx^\mu/dt$; just remember that the point on Earth $x(t)$ exists independent of which coordinates $x^\mu$ we use.

If we use degrees as the unit for latitudes and longitudes, the speed will have units $\text{deg}/\text{sec}$. The vector $\dot x$ is a tangent vector indicating our speed and direction independent of which coordinates we are using, and the natural way to draw such a tangent vector is as an arrow.

If I walk along the equator, the tangent vector is likely to be a rather short vector on the map. However, if I walk in the east--west direction close to one of the poles, since the map is streched out (relative to actual distances on the Earth) it might produce a rather long vector. I.e. when the map is stretched, the tangent vectors get stretched along with it. So if we stretch the map, the arrow representing the tangent vector gets stretched with it.

Now, let's say we have a function $F$ that takes a value anywhere on Earth. It could be the altitude at the surface, the temperature, etc.: let's say it measures the temperature in Kelvin. At any point, the function has a gradient. If we wish to illustrate $F$ on the map, one way is to colour the map according the the values of $F$, or draw the contours of $F$ on the map, i.e. the curves for which $F$ is constant. If we stretch or deform the map, these contours will still be correct as they deform with the map, so they do not depend on the coordinatesystem. The differential $dF$ tells us how fast $F$ changes at any point and in any direction and has units $\text{K}/\text{deg}$. If we stretch the map, the contour lines get further apart, making the gradient appear less steep on the map. In coordinates, we write this $dF=(dF/dx^\mu)dx^\mu$ where $dx^\mu$ is just the gradient of the coordinate. The point, again, is that $dF$ is actually independent of the coordinate system.

If we combine our walk with the function $F$, we get $F(x(t))$ as the value along our path. The change in time becomes $(d/dt)F(x(t))$ which we can write out as $$\frac{d}{dt}F(x(t))=\frac{dF}{dx^\mu}\dot x^\mu=dF\cdot\dot x\tag{1}$$ and is again independent of the coordinate system. The $dF$ and $\dot x$ are the differential of $F$ and the tangent vector of $x$, both of which are independent of the coordinates we choose to use. The units are also informative: $\dot x$ has units $\text{deg}/\text{sec}$, while $dF$ has units $\text{K}/\text{deg}$.

From (1), we see that there is a natural way to take the product of a tangent vector with a differential. Indeed, the differentials (at any point) form the dual vector space of the vector space of tangent vectors, which is why they are also called cotangent vectors.

All of this is done entirely without the need for a metric.

The metric only comes into play when you e.g. want to convert tangent vectors into a measure of actual physical distances. If you want to compute the length of a path, you need a metric. Similarly, it's needed when computing speeds in absolute terms as in the kinetic energy $E_{\text{kin}}=\frac{m}{2}g_{\mu\nu}\dot x^\mu\dot x^\nu$. Yet another place is in field/wave equation where e.g. $g^{\mu\nu}(d\phi/dx^\mu)(d\phi/dx^\nu)$ may enter.

Connections, which are mathematical object that tell you how to parallell transport vectors along a path from one point to another, can be defined without a metric. However, if there is a metric, there is a particular connection, the Levi-Civita connection, which naturally corresponds to the metric (which is natural since you need the metric to specify what is ment by shortest distance path), and when specifying the Levi-Civita connection (which in a coordinate system is done with the Christoffel symbols) you will encounter raising/lowering of indices.

While the metric does induce a natural way to identify the tangent and cotangent vector spaces, which is the identification that is applied when raising or lowering indices, this identification is metric dependent and should therefore only be required when you are computing something that depends on the metric.

My recommentation would be not to try to attribute meaning, at least not too much, to this identification of the tangent and cotangent vector spaces. Instead, you could think of why these enter the picture in physics at all and understand those cases.

Solution 2:

The short version is that indices tell you how things behave under arbitrary change of coordinates. (The long version genuinely requires some level of comfort with abstract algebra to appreciate.) When you use the metric to transform contravariant things to covariant things, the operation you're performing does not commute with arbitrary change of coordinates; it only commutes with those change of coordinates which also preserve the metric. So as long as you only work with maps that preserve the metric, there's no harm in doing this, but the moment you allow maps that don't preserve the metric, you have to be careful what kind of identifications you're making.

Covariant and contravariant tensors live in different spaces, but things don't have to live in the same space to interact. We have the freedom to talk about operations $f : X \times Y \to Z$ that take inputs of two different types and returns an input of potentially yet another type, and tensor contraction is an operation of this form.

I'm not sure what you mean by the "meaning" behind changing a vector from covariant to contravariant.

Solution 3:

Different physical quantities have different transformation rules: just to give an example:

  • position $q^i$ transforms contravariantly
  • momentum $p_i$ transforms covariantly

Why? In classical mechanics the Lagrangian is defined as a function of $q^i$ and its derivatives. On the other hand, the generalized momentum is given as:

$$ p_i=\frac{\partial L}{\partial \dot{q^i}} $$

If we change from $q$ to $\bar{q}$ the transformation law will be inverse for coordinates verse momentum due to the chain rule. This is just an example. What we choose to frame physics in terms of is in some sense a choice. Because we can convert covariant to contravariant objects with the metric there are many ways to frame a given set of physical laws.

Turning to your question about changing frames of reference changing covariant to contravariant, this is not the case. The metric transformation is not a coordinate change, it is something quite different. It's a way of changing notation, or more mathematically speaking, it is the implementation of an isomorphism.

More important than the choice of notation (writing tensors contravariant or covariantly) is the construction of the action or lagrangian. It must satisfy certain symmetries depending on what kind of physics you consider:

$$ L = \frac{m}{2} \vec{v} \cdot \vec{v} = \frac{m}{2} v_iv^i $$

The dot-product is invariant under rotations, this Lagrangian is invariant under rotations as it ought since it models a free particle in euclidean space.

$$ L = kF_{\mu \nu}F^{\mu \nu} $$

where $F_{\mu \nu}$ is the Faraday tensor which transforms covariantly whereas $F^{\mu \nu}$ is the contravariant version. Together they form a scalar with respect to Lorentz transformations (I'm avoiding the full discussion about Poincare transformations here).

The question of physics is partly this: how can you construct scalars given the symmetry of your theory? Ultimately this leads to the study of representation theory, spinors etc... it's not a short story and the question you are asking is certainly worth asking.

In the coordinate free language the covariance of the components is balanced by the contravariance of the basis or vice-versa. Note as an example: $$ \bar{A}_{\mu'} = \Lambda^{\nu}_{\mu'}A_{\nu} \qquad \text{whereas} \qquad d\bar{x}^{\mu'} = \frac{\partial \bar{x}^{\mu'}}{\partial x^{\nu}}dx^{\nu} $$ where $\bar{x}^{\nu'} = \Lambda^{\nu'}_{\mu} x^{\mu}$. Differentiate to see $\Lambda^{\mu}_{\nu'} = \frac{\partial x^{\mu}}{\partial \bar{x}^{\nu'}}$ for Minkowski space where I'm considering a coordinate change is constant over all points in spacetime; a Lorentz transformation. Put it together, since $\frac{\partial \bar{x}^{\nu'}}{\partial x^{\mu}}$ is inverse to $\frac{\partial x^{\mu}}{\partial \bar{x}^{\nu'}}$ by the chain rule: this means that:

$$ \frac{\partial \bar{x}^{\nu'}}{\partial x^{\mu}}\frac{\partial x^{\mu}}{\partial \bar{x}^{\alpha'}} = \delta_{\alpha'}^{\nu'} $$

which we could write in the $\Lambda$ notation as $\Lambda^{\nu'}_{\mu}\Lambda^{\mu}_{\alpha'}=\delta_{\alpha'}^{\nu'} $. The form $A$ can either be written in the barred or unbarred coordinates.

$$ A = \bar{A}_{\mu'}d\bar{x}^{\mu'} = A_{\nu}dx^{\nu} $$

The claim that these are in fact equal is supported by the transformation laws I gave above for the covariant components of $A$ and the contravariant transformation of the basis forms $dx^{\mu}$.

The mathematics I'm outlining here is mostly linear algebra and the concept of a basis. The transformation law for the basis is inverse the components. The fundamental object considered, be it a vector, form, tensor etc... is invariant under the coordinate change. It is our picture of it that changes. That is how I think about it.

Solution 4:

i would like t add a further view on this question (complemetary to the already nice answers)

Both co-variant and contra-variant vectors are just vectors (or more generaly 1-order tensors).

Furthermore they are vectors which relate to the same underlying space (for example euclidean space or generally, a manifold)

Furthermore they relate to the same space but in different but dual ways (as such they have different, but dual, transformation laws, as already stated)

Co-variant vectors are part of what is called the tangent space, which for an euclidean space coincides or is isomorphic to the space itself. And contra-variant vectors are part of the dual tangent space, called co-tangent space (as already noted in other answers) and which for an euclidean space also coincides or is also isomorphic to the space itself.

These spaces (and their vectors) are dual, in the algebraic sense, related through the norm (inner product) of the space. Plus are isomorphic to each other regardless if they are isomorphic to the underlying manifold itself (what is called raising and lowering indices)

A question is what these (associated) spaces represent, how are they related and what is the intuition behind their use?

Historicaly tensors and tensor analysis was initiated as a by-product of the theory of invariants. A way was needed to express invariant quantities under a change of representation (or change of underlying basis). Thus tensors were used. tensors represent quantities which transform under a change of representation in such ways as to make various quantities expressed in terms of them invariant.

Note, the terminology association with co-variant/contra-variant indices is largely an convention, any consistent convention will do.

This also gives the (intuitive) relation between co-variant tensors (vectors) and contra-variant tensors (vectors). When a co-variant vector (components) transform in one way, for example by a scaling factor $s$. The (associated) contra-variant vector (components) will have to transform by the inverse scaling factor $1/s$ in order for invariant quantities (for example an inner product $a^ib_i$) to remain invariant.