3D projection on a 2D plane ( weak maths ressources )

Solution 1:

3D projection is really very simple. The hard part is understanding how it is done; and that is what I shall try to explain here. It is all based on optics, and (linear) algebra.

Let's assume you stand in front of a window, looking out. If you stand in the center of the window, looking out through the center of the window, then we can treat the center of your eye (more precisely, the center of the lens in the pupil of your dominant eye) the origin in 3D coordinates. Using OP's conventions, $x$ axis increases up, $y$ axis right, and $z$ axis outside the window. Thus, the center of the window is at $(0, 0, d)$, where $d$ is the distance from the eye to the window.

If we know the 3D coordinates in the above coordinate system of interesting details outside, 3D projection tells us their coordinates on the surface of the window. These coordinates are what OP needs to draw 3D pictures to a 2D surface.

Here is a rough diagram of the situation: Perspective projection diagram The blue pane is the window, the eye is at the lower left corner, and we are interested in the projected coordinates (projected to the window, that is) of the four corners of some cube at some distance.

In a very real sense, those coordinates are obtained by linear interpolation, except that one end of the line segment is at the eye (which we already decided is the origin, so coordinates $(0, 0, 0)$, the other end is at the 3D coordinates of the detail we wish to project, and the interpolation point is where that sight line (usually called "ray") intersects the view plane (the window, in our case).

Let's say one of the 3D coordinates of an interesting detail, say a corner of the greenish cube above, are $(x , y , z)$. That ray intersects the window at $$\begin{cases} x' = x \frac{d}{z} \\ y' = y \frac{d}{z} \\ z' = z \frac{d}{z} = d \end{cases}$$ Therefore, the 2D coordinates of that detail on the window are $$(x' , y') = \left( x \frac{d}{z} ,\, y \frac{d}{z} \right) = \frac{d}{z} ( x , y )$$ or, in other words, you simply multiply the $x$ and $y$ 3D coordinates by $d/z$.

If some detail has a $z$ coordinate smaller than $d$, it is basically between the window and the eye. These are problematic, because projecting them to the window no longer makes sense wrt. optics. In visualization software, such details are normally simply not drawn; then, you can think of the window as being the "camera surface" of some pyramid-shaped probe, the tip of the pyramid being the 3D coordinate system origin.

In games, details with $z$ coordinate smaller than $d$ often produce visual glitches, like seeing through a wall, or similar. The real numerical problems occur near the origin, near the eye. If any graphical primitive, like a line, plane, or sphere, intersects with the eye, it means the eye is badly hurt in real life; with 3D projection, we get unrealistic results (like polygons getting twisted through origin), because our model breaks down at the eye point.


I very warmly recommend you familiarize yourself with basic 2D and 3D vector algebra. Using vector addition, subtraction, scaling, dot product, and cross product, many of the operations you need to work with 3D worlds become much simpler.

Do not bother with Euler angles (or Tait-Bryan angles); they have limitations (especially gimbal lock) and ambiguities (the order of the rotations). Instead, learn about versors, or unit quaternions that represent rotations. They have four components, $$\mathbf{q} = ( w , i , j , k )$$ where $w^2 + i^2 + j^2 + k^2 = 1$. You can easily interpolate ("blend") between different versors, for example to simulate a camera panning and rotating from one orientation to the next.

Although quaternions have a reputation (among programmers) of being hard to grok, their unit quaternion or versor subset is actually very programmer-friendly. They are numerically stable (you can always divide the components by $\sqrt{w^2 + i^2 + j^2 + k^2}$ to scale it back to unit length, and it won't bias the rotation in any specific way).

For computation, you expand (convert) the versor to a 3×3 rotation matrix, which unlike those for Euler or Tait-Bryan angles, is unique for versors. There are no "gotchas" or ambiguities.

You will also need to learn about matrix-matrix multiplication, and matrix-vector multiplication, so that you can efficiently apply the rotations to vectors.

Matrix-matrix multiplication is used to combine rotations or transformations described by matrices, to other such matrices. This means you only need to use one matrix to transform any vector, but that matrix can be the result of several different transformations itself. (For example, if you have a robot arm with ball joints, you can describe each joint using a versor, and a rotation matrix derived from that versor. When you start at the base, you simply multiply the current transformation matrix by the ball joint transformation matrix, to get the local coordinate system in the part that follows each ball joint.)

Versor-versor multiplication (Hamilton product) does the exact same for versors: multiplying $\mathbf{q}_1 \mathbf{q}_2$ yields a versor that represents a rotation by versor $\mathbf{q}_1$ followed by a rotation by versor $\mathbf{q}_2$. (Numerically, you'll want to divide each component of the result by $\sqrt{w^2 + i^2 + j^2 + k^2}$, to ensure it has unit length; as I wrote earlier, this impacts no bias to the result, and allows you to apply as many consecutive rotations as you want, without any distortion -- unlike for example for matrices, which would require orthonormalization after a dozen or so steps, even if using double-precision floating-point numbers.)


The reverse problem, trying to find the object and the point on an object, when you know the ray arriving at the eye, is called ray casting. If you then continue to trace the possible rays, to find out which ones might originate in light sources, you get to ray tracing.

If you transform the 3D coordinate system so that your eye (or camera) is always at origin, the intersection tests become much easier. In particular, consider the sphere case: let $\vec{c}$ is the center of the sphere, $r$ is its radius, and $\hat{n}$ is the unit vector ($\lVert\hat{n}\rVert = 1$) showing the direction where the ray came to the eye. Let $$D = r^2 + \left ( \hat{n} \cdot \vec{c} \right )^2 - \vec{c} \cdot \vec{c}$$ If $D \lt 0$, the ray did not intersect the sphere. If $D = 0$, the ray grazed the sphere, i.e. intersected it tangentially (at one point). If $D \gt 0$, the ray intersects the sphere at distance $R$ from origin: $$R = \hat{n} \cdot \vec{c} \pm \sqrt{ D }$$ at point $R \hat{n}$. If we are outside the sphere, use $-$ above; if we are inside the sphere, use $+$ above. In general, pick the sign that yields the smaller, but positive, $R$.

Since ball-and-stick models are often a favourite starting point for physicists and chemists interested in 3D graphics, I've collected the formulas needed to do the above with cylinders (without end caps, with flat end caps, or with spherical end caps) to my Wikipedia user page.