Covectors and Vectors

I have a general question about vector/covectors:

Background. A vector (for our purposes) is a physical object in each basis of $\mathbb{R}^3$ represented by three numbers such that these numbers obey certain transformation rules when we change the basis. Let $\textbf{x}$ be an arbitrary vector and $\textbf{e}_1, \textbf{e}_2, \textbf{e}_3$ and $\tilde{\textbf{e}}_1, \tilde{\textbf{e}}_2, \tilde{\textbf{e}}_3$ be two bases. These transformation/inverse transformation rules are the following:

$$\tilde{x}^{j} = \sum_{j=1}^{3} T_{i}^{j} x^i$$ and $$x^{j} = \sum_{i=1}^{3} S_{i}^{j} \tilde{x}^i$$

Question. Vectors satisfy the above properties. Now if I imagine some other set of objects that satisfy the above properties why do we call them covectors? What are covectors and how are they different from vectors if they satisfy the same properties?


Given a vector space $V$, there is a "dual" space $V^*$ which consists of linear functions $V\to \mathbb{F}$ (where $\mathbb{F}$ is the underlying field). Given $v\in V, \phi \in V^*$, we can plug in to get a number $\phi(v)$.

Because of linearity, $V^*$ is actually a vector space. If $V$ is finite dimensional, then $V^*$ has the same dimension. One way to see this is, if we fix a basis $e_1, \ldots e_n \in V$, we have a basis $\phi_i, \ldots, \phi_n \in V^*$ defined by $\phi_i(e_j)=\delta_{ij}$, which is $1$ when $i=j$ and $0$ otherwise.

Of course, this isomorphism requires choosing a basis, and in general, there is no "natural" choice of isomorphism. Additionally, when $V$ is infinite dimensional, $V$ and $V^*$ will not be isomorphic.

So if we are just doing basic linear algebra, there is no real difference between vectors and covectors. There are some constructions that might seem to require a choice of basis if you don't use covectors (like taking the transpose of a matrix), but they are not fundamentally different kinds of objects. However, if we want to work geometrically, we can see a difference.

Given a manifold $M$, and a point $p\in M$, we have a vector space $T_pM$ of the tangent vectors to $M$ at $p$. For example, if you take the hollow sphere sitting inside $R^3$, you can look at the plane that sits tangent to a point, and turn it into a vector space. These tangent vectors act on functions by taking the directional derivative of a function at a point. If you take a tangent covector, it no longer acts on functions, it just acts on vectors. Geometrically speaking, it is a fundamentally different kind of object. By taking a tangent vector at every point, you get something called a vector field, but taking a covector at every point you get something called a differential form. They are both useful notions, but they are used in fundamentally different ways.

Of course, once you get the general notion of a vector bundle (essentially, a way of smoothly putting a vector space at every point of a manifold), you can see that tangent vectors and tangent covectors are just dual vector bundles, and in the absence of certain geometric constructions can be treated very similarly.


I was once told that the reason some of the usual terminology in the representation theory of Lie algebras is vaguely painful to use is because originally it was developed by physicists who couldn't tell the difference between a vector space and its dual.

I admit to be wholly ignorant of how physicists think about this stuff (vectors and co-vectors; not Lie algebras) in practice, but my limited (and perhaps mathematically biased) experience suggests that there are very good reasons for why the difference between vectors and co-vectors is subtle and not easy to see in concrete physical contexts.

First, the physicist's definition of a vector as sequence of numbers satisfying certain transformation rules seems to me to be saying that a physical object is represented by a vector, if in each basis (measurement devices of some sort, related via certain matrices), the associated sequence of numbers that come from measuring the object using those devices are related by matrix multiplication/linear equations (this is how I interpret Damien's definition).

This perspective is natural for physicists since they study empirical phenomena and they have to decide on how to model their data, which depends on what relationships between the data they observe. It is worth noting, however, that a physicist never sees an abstract vector - thy see only the numerical representation of that vector relative to a specific basis (measuring device), which means that the vector spaces physicists deal with in this context have an a priori chosen basis (without a basis it is senseless to talk about $\mathbb R^n$ and vectors corresponding to $n$-tuples of numbers).

In detail, what's happening is this. Suppose we have a measuring device, which takes in vectors and spits out three numbers, so a measuring device consists of three functions $\mathbf e_1^*, \mathbf e_2^*, \mathbf e_3^*$ that take vectors $v$ in a linear way to numbers in the underlying field $\mathbb R$ (the field doesn't have to consist of real numbers; it's just more intuitive geometrically in this way). In other words, the functions $\mathbf e_1^*,\mathbf e_2^*,\mathbf e_3^*$ are linear functionals $\mathbb V\to\mathbb R$ from our vector space to the real numbers, and we can put their results on a vector $\mathbb v$ together in a triple of numbers $(x,y,z)\in\mathbb R^n$ where $e_1^*(v)=x$, $e_2^*(v)=y$, $e_3^*(v)=z$,

Then, these three linear functions I claim determine a unique basis for the vector space $V$. This basis, which we'll call $\mathbb e_1,\mathbb e_2,\mathbb e_3$ consists of the unique vectors such that $\mathbf e_i^*(\mathbf e_j)=\begin{cases}1&i=j\\0&i\neq j\end{cases}$, i.e. the basis vectors $\mathbb e_1,\mathbb e_2,\mathbb e_3$ are the unique three vectors that corresponds to three the triples $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$ respectively; they are precisely the objects that when measured give exactly those values.

Now, if vectors are the objects whose coordinates are measured by measuring devices, then co-vectors are precisely the functions that compose the measuring devices. In other words, the three functions $\mathbb e_1^*,\mathbb e_2^*,\mathbb e_3^*$ are examples of co-vectors, and they live in the dual space $V^*$ of linear functions $V\to\mathbb R$.

Of course, just as we thought of vectors as triples of numbers that satisfy certain properties relative to bases, so can we think of co-vectors as triples of numbers that satisfy certain properties relative to bases. However, it is a bit counter-intuitive to ask what co-vectors are measured by (they are measured by triples of objects), so instead let us think about how think actually look like in coordinates relative to some basis.

We know that an $n\times k$ matrix with real entries represents a linear transformation from $\mathbb R^n\to\mathbb R^k$. Now, we can think of vectors as linear transformations $\mathbb R\to V$ since given such a transformation $v$, we have a canonical vector $v(1)$. This then is exactly where the column matrix $\left[\begin{matrix}x\\y\\z\end{matrix}\right]$ comes from, and why it represents a vector.

So while vectors in $\mathbb R^3$ can be thought of as function $v\colon \mathbb R\to\mathbb R^3$, co-vectors can be thought of as functions $f\colon\mathbb R^3\to\mathbb R$. But those will be represented by row matrices $\left[\begin{matrix}a&b&c\end{matrix}\right]$.

Question: given our standard basis $\mathbb e_1,\mathbb e_2, \mathbb e_3$, what do the measuring functions $\mathbb e_1^*,\mathbb e_2^*,\mathbb e_3^*$ look like? Well, applying a function $f$ to a vector $v$ is the same as computing the composite linear map $\mathbb R\overset{v}\to\mathbb R^3\overset{f}\to\mathbb R$, and composing linear maps is done precisely via matrix multiplication.

This, if relative to a standard basis we have the co-vector $f=\left[\begin{matrix}a&b&c\end{matrix}\right]$, and the vector $v=\left[\begin{matrix}x\\y\\z\end{matrix}\right]$, then measuring $v$ with $f$ gives the number $f(v)=ax+by+cz$.

So in particular, we have that $f(\mathbb e_1)=a$, $f(\mathbb e_2)=b$, and $f(\mathbb e_3)=c$ where $\mathbb e_1,\mathbb e_2,\mathbb e_3$ is the standard basis given by $\mathbb e_1=\left[\begin{matrix}1\\0\\0\end{matrix}\right]$, $\mathbb e_2=\left[\begin{matrix}0\\1\\0\end{matrix}\right]$, and $\mathbb e_3=\left[\begin{matrix}0\\0\\1\end{matrix}\right]$. This implies that relative to that same basis we have $\mathbb e_1^*=\left[\begin{matrix}1&0&0\end{matrix}\right]$, $\mathbb e_2^*=\left[\begin{matrix}0&1&0\end{matrix}\right]$ and $e_3^*=\left[\begin{matrix}0&0&1\end{matrix}\right]$.

Evidently, this shows that the measuring functions $\mathbb e_1^*$,$\mathbb e_2^*$,$\mathbb e_3^*$ form a basis for the dual space $V^*$. Remembering where $\mathbb e_1^*$, $\mathbb e_2^*$, and $\mathbb e_3^*$ came from, we see that choosing a measuring device is actually a basis for the space of co-vectors. Furthermore, we see that choosing a basis of co-vectors also determines uniquely a dual basis of vectors, and hence an isomorphism between vectors and co-vectors.

This allows us to write down what property co-vectors satisfy. Evidently, they satisfy the same property as vectors if we switch between bases of co-vectors. The different property that they satisfy comes from choosing different bases of vectors. Specifically, let $f$ be a co-vector, and let $\mathbb e_1$,$\mathbb e_2$,$\mathbb e_3$ and $\tilde{\mathbb e}_1$, $\tilde{\mathbb e}_2$, $\tilde{\mathbb e}_3$ be another basis. Then:

$$\tilde{v}_{j} = \sum_{j=1}^{3} {T^t}_{i}^{j} v_i$$

and

$$v_{j} = \sum_{j=1}^{3} {S^t}_{i}^{j}\tilde{v}^i$$

where $M^t$ is the transpose of $M$, so in fact ${M^t}_i^j=M_j^i$.


The (physical) distinction between vectors and covectors is related to frame of reference changes.
Since you asked about $3$-vectors, I will refer just to spatial frame of reference changes, but a similar rule holds for 4-vectors and space-time frame of reference changes.

Let $O$, $O'$ be two spatial frames of reference, $(x_1, x_2, x_3)$ are the coordinates of an event in $O$, $(x'_1, x'_2, x'_3)$ the coordinates of the same event in $O'$; they are related by $$ x'_i = \sum_{j=1}^{3} T_{ij} x_j + k_i $$

Let $A$ be an observable completely defined by $3$ numbers, $(A_1, A_2, A_3)$ are the values of those numbers in $O$, $(A'_1, A'_2, A'_3)$ the values in $O'$.
Iff the following equality holds $$ A'_i = \sum_{j=1}^{3} T_{ij} A_j $$ $A$ is said to be a vector. Otherwise, iff the following one holds $$ A'_i = \sum_{j=1}^{3} (T^{-1})_{ji} A_j $$ $A$ is said to be a covector.

In other words, vectors transform with the same matrix $T$ as space coordinates and covectors with the matrix $(T^{-1})^{tr}$.

Usually, one uses orthogonal coordinate systems with same length unit of measurement, so $T$ is just an $O(3)$ matrix. Under that condition, we have $$ T = (T^{-1})^{tr} $$ and covectors are undistinguishable from vectors. This fact explains why we rarely speak of covectors in classical mechanics.
The situation radically changes with space-time frames of reference in special relativity.
In that case, $T$ is an $O(1, 3)$ matrix, $T$ is (in general) different from $(T^{-1})^{tr}$ and, vectors and covectors are very distinct.

As a final remark, note that there are "vector-like" (classical) physical quantities that are neither vectors nor covectors.

The angular momentum of a point mass is defined by $$ \mathbf l = m \mathbf r\times \mathbf v $$ where $m$, $\mathbf r$ and $\mathbf v$ are, respectively, mass, position and speed of the body.
If $(l_1, l_2, l_3)$ are the components of $\mathbf l$ in $O$, $(l'_1, l'_2, l'_3)$ the components in $O'$ and $T$ is a rotation (that is $T\in SO(3)$) then it's easy to verify that $$ l'_i = \sum_{j=1}^3 T_{ij} l_j $$ So $\mathbf l$ behaves like a vector with respect to rotations. But if the transformation from $O$ to $O'$ is $$ x'_i = -x_i $$ then we have $$ l'_i = l_i $$ Though often $\mathbf l$ is referred to as a vector, it is neither a vector nor a covector. It is more appropriate calling it pseudovector.


This is my understanding: If $x$ is a column vector in $\mathbf R^n$, then its corresponding covector is nothing more that $x^T$, i.e. the transpose of $x$, which is a row vector. Now $x^T$ is such that for any column vector $y \in \mathbf R^n$, $x^Ty$ is a real number. Furthermore, we have that $x^T(\alpha y + z) = \alpha x^Ty + x^Tz$ for any scalar $\alpha$ and vectors $y,z$. Thus, $x^T$ is actually a linear functional. Conclusion: covectors are just linear functionals.