Why is the momentum a covector?
Solution 1:
Addendum: if you know Lagrangian mechanics, there the generalised momentum is defined to be $$\frac{\partial L(q,\dot{q}, t)} {\partial \dot{q}}$$ because this is the thing that is conserved if one of the coordinates is cyclic. This is clearly a linear function on the generalised velocities, so you can identify it with a covector.
For a free particle $L = \frac{1}{2} m \dot{q} ^2$ so that Lagrange's equation implies $$\frac{d}{dt}\frac{\partial L(q,\dot{q}, t)} {\partial \dot{q}} = 0 \implies m\dot{q} = const$$ Note that this is no longer a statement about $p=mv$ itself, but about a linear function of it. In other conditions there may be nothing interesting to say about $m\dot{q}$, but if the Lagrangian doesn't depend on $q$, $\partial L / \partial \dot{q}$ will still be conserved.
Short answer: for a coordinate system $(q^1,...q^n)$ on a manifold $M$ we let the generalised momenta $(p_1,...,p_n)$ be a basis for the contangent space which acts on $\lambda \in \pi^{-1}(M) \subset T^*M$ by $p_i(\lambda) = \lambda(\frac{\partial}{\partial q^i})$ where $\pi: T^*M \rightarrow M$ is the projection map . This gives the same results when $M$ is a vanilla vector space even though here momentum is not quite $p=mv$.
The underlying reason for this is that in Hamiltonian mechanics, the physics actually happens in the cotagent bundle, the 2n dimensional manifold parametrized by $(q^1 \circ \pi, ... , q^n \circ \pi, p_1, ..., p_n)$.
This formalism is motivated by the somewhat symmetric hole played by the $q^i$ and the $p_i$ in Hamilton's equation, so that we eventually forget about the base manifold and actually consider arbitrary 2n dimensional manifolds equipped with a anti-symmetrical non degenerate differential form (the symplectic form) which distinguishes the position from the momenta.
The topic is too big to explain in detail here, but looking at mechanics in this way gives you many deep results relatively easily. For instance, conservation on the symplectic form under motions implies conservation of volume of phase space. Also, due to the similarities between Hamilton's equations and the Cauchy Riemann equations, complex analysis methods can give some insight. This is the field of pseudoholomorphic curves.
For an introduction see the last chapters of Spivak's Physics for Mathematicians.
Solution 2:
I think the point is $p=mv$ being very very special case is a misleading idea about momentum. Even a point system of particle in $R^3$ for some same reasons is misleading: space have non-important (for our discussion) flat structure and we add up momentums of particles as vector and get a vector! In the general case (real case!) on configuration manifold at a specific point $(q,\dot q)$, momentum is not a vector parallel to $\dot q$ at all. Rather it is a question: if I'd like to disturb evolution of system from $\dot q$ in a specific direction how much change in Lagrangian will be seen? (in this sense intuitively general momentum have common feature by special case $mv$ in some inertia property). Then, for every specific direction on manifold (a vector) you get a real number and so momentum, being that question, is a 1-form.
Solution 3:
The Lagrangian $L(q, v)$ is a function on the tangent bundle $TM$. The Hamiltonian function is defined as the Legendre transform of the Lagrangian: \begin{align*} H(q, p) = \sum_i p_i v_i - L \end{align*} The Hamiltonian is a scalar. Thus, the right hand side has to be a scalar. Now $v_i$ transforms like components of a vector, and thus $p_i$ has to to transform like components of a covector. Thus $p \in T^*_qM$, and $(q, p) \in T^*M$.