Why are quaternions used for rotations?
Solution 1:
Gimbal lock is one reason, although as you say it is only a problem with Euler angles and is easily solvable. Euler angles are still used when memory is a concern as you only need to store 3 numbers.
For quaternions versus a 3x3 rotation matrix, the quaternion has the advantage in size (4 scalars vs. 9) and speed (quaternion multiplication is much faster than 3x3 matrix multiplication).
Note that all of these representations of rotations are used in practice. Euler angles use the least memory; matrices use more memory but don't suffer from Gimbal lock and have nice analytical properties; and quaternions strike a nice balance of both, being lightweight, but free from Gimbal lock.
Solution 2:
In physics, there are very good reasons we don't use quaternions (despite the bizarre story that's occasionally told about Hamilton/Gibbs/etc). Physics requires that our descriptions have good analytic behavior (this has a precisely defined meaning, but in some rather technical ways that go far beyond what's taught in normal intro classes, so I won't go into any detail). It turns out that quaternions don't have this nice behavior, and so they aren't useful, and vectors/matrices do, so we use them.
Well, I am a physicist, too. And there are some situations where quaternions simply rock! Spherical Harmonics for example. You have two atoms scattering, exchanging an electron: what is the orbital spin transfer? With quaternions it is just multiplication i.e. summing up the exponents of the SH base functions expressed as quaternions. (Getting the Legendre Polynomials into quaternion notation is a bit tedious though).
But I agree, they are not a universal tool, and especially in rigid body mechanics they would be very cumbersome to use. Yet to cite Bertrand Russell answer in question of a student how much math a physicist needs to know: "As much as possible!"
Anyway: Why do we love quaternions in computer graphics? Because they have a number of appealing properties. First one can nicely interpolate them, which is important if one is animating rotating things, like the limbs around a joint. With a quaternion it is just scalar multiplication and normalization. Expressing this with a matrix requires evaluation of sin and cos, then building a rotation matrix. Then multiplying a vector with a quaternion is still cheaper as going through a full vector-matrix multiplication, it is also still cheaper if one adds a translation afterwards. If you consider a skeletal animation system for a human character, where one must evaluate a lot of translation/rotations for a large number of vertices, this has a huge impact.
Another nice side effect of using quaternions is, that any transformation inherently is orthonormal. With translation matrices one must re-orthonormalize every couple of animation steps, due to numerical round-off errors.
Solution 3:
The no gimbal lock argument seems odd, since this is only a problem of euler angles. It is also only a coordinate problem (just like the singularity at r=0 in polar coordinates (the Jacobian looses rank)), which means it is only a local problem, and can be resolved by switching coordinates, rotating out of the degeneracy, or using two overlapping coordinate systems.
Many 3D applications like using Euler angles for defining an object's orientation. For flight-sims in particular, they represent a theoretically useful way of storing the orientation in a way that is easily modifiable.
You should also be aware that things like "switching coordinates, rotating out of the degeneracy, or using two overlapping coordinate systems" all require effort. Effort means code. And code means performance. Losing performance when you don't have to is not a good thing for many 3D applications. After all, what is to be gained by all of these tricks, if just using quaternions would get you everything you needed.
I'm less sure about numerical issues, since I don't know in detail how both of these (and any alternatives) would be implemented. I've read that re-normalizing a quaternion is easier than doing that for a rotation matrix, but this is only true for a general matrix; a rotation has additional constraints that trivializes this (which are built into the definition of quaternions) (In fact, this has to be true since they have the same number of degrees of freedom).
The numerical issues come up when dealing with multiple consecutive rotations of an orientation. Imagine you have an object in space. And every timeslice, you apply a small change of yaw to it. After each change, you need to re-normalize the orientation; otherwise, precision problems will creep in and screw things up.
If you use matrices, each time you do matrix multiplication, you must re-orthonormalize the matrix. The matrix that you are orthonormalizing is not yet a rotation matrix, so I wouldn't be too sure about that easy orthonormalization. However, I can be sure about this:
It won't be as fast as a 4D vector normalization. That's what quaternions use to normalize after successive rotations.
Quaternion normalization is cheap. Even specialized rotation matrix normalization will not be as cheap. Again, performance matters.
There's also another issue that matrices don't do easily: interpolation between two different orientations.
When dealing with a 3D character, you often have a series of transformations defining the location of each bone in the character. This hierarchy of bones represents the character in a particular pose.
In most animation systems, to compute the pose for a character at a particular time, one interpolates between transformations. This requires interpolating the corresponding transformations.
Interpolating two matrices is... non-trivial. At least, it is if you want something that resembles a rotation matrix at the end. After all, the purpose of the interpolation is to produce something part-way between the two transformations.
For quaternions, all you need is a 4D lerp followed by a normalize. That's all: take two quaternions and linearly interpolate the components. Normalize the result.
If you want better quality interpolation (and sometimes you do), you can bring out the spherical lerp. This makes the interpolation behave better for more disparate orientations. This math is much more difficult and requires more operations for matrices than quaternions.
Solution 4:
Opinion: Quaternions are nice.
Rotation matrix: Minor disadvantage: Multiplication of matrices is ~2 times slower than quaternions. Minor Advantage: Matrix-vector multiplication is ~2 times faster, and large. Huge disadvantage: Normalization! Ghram-Shmit is asymmetrical, which does not give a higher order accurate answer when doing differential equations. More sophisticated methods are very complex and expensive.
Axis (angle = length of axis) Minor advantage: Small. Moderate disadvantage: Multiplication and applying to a vector is slow with trig. Moderate disadvantage: North-pole singularity at length = 2*pi, since all axis directions do nothing. More code (and debugging) to automatically rescale it when it gets near 2pi.