Mathematically compute a simple graphics pipeline

I am trying to do / understand all the basic mathematical computations needed in the graphics pipeline to render a simple 2D image from a 3D scene description like VRML. Is there a good example of the steps needed, like model transformation (object coordinates to world coordinates), view transformation (from world coordinate to view coordinate), calculation of vertex normals for lighting, clipping, calculating the screen coordinates of objects inside the view frustum and creating the 2D projection to calculate the individual pixels with colors.


I am used to OpenGL style render math so I stick to it (all the renders use almost the same math)

First some therms to explain:

  1. Transform matrix

Represents a coordinate system in 3D space

    double m[16]; // it is 4x4 matrix stored as 1 dimensional array for speed
    m[0]=xx; m[4]=yx; m[ 8]=zx; m[12]=x0;
    m[1]=xy; m[5]=yy; m[ 9]=zy; m[13]=y0;
    m[2]=xz; m[6]=yz; m[10]=zz; m[14]=z0;
    m[3]= 0; m[7]= 0; m[11]= 0; m[15]= 1;

where:

  • X(xx,xy,xz) is unit vector of X axis in GCS (global coordinate system)
  • Y(yx,yy,yz) is unit vector of Y axis in GCS
  • Z(zx,zy,zz) is unit vector of Z axis in GCS
  • P(x0,y0,z0) is origin of represented coordinate system in GCS

Transformation matrix is used to transform coordinates between GCS and LCS (local coordinate system)

  • GCS -> LCS: Al = Ag * m;
  • GCS <- LCS: Ag = Al * (m^-1);
  • Al (x,y,z,w=1) is 3D point in LCS ... in homogenous coordinates
  • Ag (x,y,z,w=1) is 3D point in GCS ... in homogenous coordinates

homogenous coordinate w=1 is added so we can multiply 3D vector by 4x4 matrix

  • m transformation matrix
  • m^-1 inverse transformation matrix

In most cases is m orthonormal which means X,Y,Z vectors are perpendicular to each other and with unit size this can be used for restoration of matrix accuracy after rotations,translations,etc ...

For more info see Understanding 4x4 homogenous transform matrices

  1. Render matrices

There are usually used these matrices:

  • model - represents actual rendered object coordinate system
  • view - represents camera coordinate system (Z axis is the view direction)
  • modelview - model and view multiplied together
  • normal - the same as modelview but x0,y0,z0 = 0 for normal vector computations
  • texture - manipulate texture coordinates for easy texture animation and effect usually an unit matrix
  • projection - represent projections of camera view ( perspective ,ortho,...) it should not include any rotations or translations its more like Camera sensor calibration instead (otherwise fog and other effects will fail ...)
  1. The rendering math

To render 3D scene you need 2D rendering routines like draw 2D textured triangle ... The render converts 3D scene data to 2D and renders it. There are more techniques out there but the most usual is use of boundary model representation + boundary rendering (surface only) The 3D -> 2D conversion is done by projection (orthogonal or perspective) and Z-buffer or Z-sorting.

  • Z-buffer is easy and native to now-days gfx HW
  • Z-sorting is done by CPU instead so its slower and need additional memory but it is necessary for correct transparent surfaces rendering.

So the pipeline is as this:

  1. obtain actual rendered data from model
  • Vertex v
  • Normal n
  • Texture coord t
  • Color,Fog coord, etc...
  1. convert it to appropriate space
  • v=projection*view*model*v ... camera space + projection
  • n=normal*n ... global space
  • t=texture*t ... texture space
  1. clip data to screen

This step is not necessary but prevent to render of screen stuff for speed and also face culling is usually done here. If normal vector of rendered 'triangle' is opposite then the polygon winding rule set then ignore 'triangle'

  1. render the 3D/2D data

use only v.x,v.y coordinates for screen rendering and v.z for z-buffer test/value also here goes the perspective division for perspective projections

  • v.x/=v.z,vy/=v.z

Z-buffer works like this: Z-buffer (zed) is 2D array with the same size (resolution) as screen (scr). Any pixel scr[y][x] is rendered only if (zed[y][x]>=z) in that case scr[y][x]=color; zed[y][x]=z; The if condition can be different (it is changeable)

In case of using triangles or higher primitives for rendering The resulting 2D primitives are converted to pixels in process called rasterization for example like this:

  • Algorithm to fill triangle

For more clarity here is how it looks like:

3D rendering

[Notes]

Transformation matrices are multiplicative so if you need transform N points by M matrices you can create single matrix = m1*m2*...mM and convert N points by this resulting matrix only (for speed). Sometimes are used 3x3 transform matrix + shift vector instead of 4x4 matrix. it is faster in some cases but you cannot multiply more transformations together so easy. For transformation matrix manipulation look for basic operations like Rotate or Translate there are also matrices for rotations inside LCS which are more suitable for human control input but these are not native to renders like OpenGL or DirectX. (because they use inverse matrix)

Now all the above stuff was for standard polygonal rendering (surface boundary representation of objects). There are also other renderers out there like Volumetric rendering or (Back)Ray-tracers and hybrid methods. Also the scene can have any dimensionality not just 3D. Here some related QAs covering these topics:

  • GLSL 3D Volumetric back raytracer
  • GLSL 3D Mesh back raytracer
  • 2D Doom/Wolfenstein techniques
  • 4D Rendering techniques
  • Comanche Voxel space ray casting