what happens in each stage of the render pipeline in the Phong lighting model? [closed]

I asked this question to understand the specular reflection. I read from this website to understand phong shading. But didn't understand Problems with Phong shading from that sites which are given below:

  1. Distances change under perspective transformation.

  2. Can't perform lighting calculation in projected space.

  3. Normals don't map through perspective transformation.

  4. Normals lost after projection.

  5. Have to perform lighting calculation in world space.

  6. Requires mapping position backward through perspective transformation.

Can anybody explain the number of stages how projection will be happening 3D to 2D projection with normals, vertex coloring of polygon then all the intermediate pixel coloring in phong shading.

I also read that site that Pipeline is one way and lighting information lost after projection. Therefore, Phong not normally implemented in hardware. Why phong normally is implemented in software why not in hardware didn't understand?


I don't think I can defend that text. It took me a while to write this answer becase I focused in how each point is a problem… But I don't think they are problems to begin with.


"Problems" with Phong shading

I'll quickly go over the listed points.


Distances change under perspective transformation.

This is true. Not only the computing the distance between two points in the projection would be done is one dimension less (it would be 2D instead of 3D), but also perspective projection has foreshortening.

But do not see how this is a problem for Phong Shading. In fact, the same text argues that cross ratios are preserved in perspective projection (source). And thus the distances of a point in 2D to the projection of the vertex (also in 2D) can be used as weights for linear interpolation.

I'll cover the interpolation below.


Can't perform lighting calculation in projected space.

We can't do Phong light computations in the projected space. Because we need the normal, the position of the camera, and the position of the light source, all in 3D.

There is such thing as 2D lights - which is used in 2D games - but it is not Phong light.


Normals don't map through perspective transformation.

We can consider the perspective transformation a mapping. Not an useful one, but a mapping.

I believe the problem here is that in the classic pipeline, normals are discarded when the perspective transformation is done. Which explains why they phrase the next point as they do.


Normals lost after projection.

So, if (in the classic pipeline) normals are discarded when the perspective projection is done, we just don't have them afterwards.

This is not a problem with the modern programable pipeline.


Have to perform lighting calculation in world space.

We need the vectors in 3D (either in world space or in camera space) to do Phong light computations. Again, this is only a problem for the classic pipeline.


Requires mapping position backward through perspective transformation.

This is saying that you need to take something that was projected and figure out its position before the projection. We don't have to do that if we don't discard the 3D positions when we do the projection. Again, this is only a problem for the classic pipeline.


Interpolation

Let us say a projected triangle is given by the points $v_1 = (x_1, y_1)$, $v_2 = (x_2, y_2)$ and $v_3 = (x_3, y_3)$.

And we are currently working on pixel with position $p = (x, y)$ that is inside the projected triangle.

Consider the scanline to which the pixel belongs. That is an an horizontal that crosses the point $(x_p, y_p)$.The scanline must cross two of the sides of the projected triangle. Without loss of generality, let us say the scanline crosses the sides $\overline{v_1v_2}$ and $\overline{v_1v_3}$.

Now, let us say we have some attribute associated with each vertex that we want to interpolate. For example the attribute could be a normal vector $\vec{N}$ (the normal at $v_1$ is $\vec{N_1}$, the normal at $v_2$ is $\vec{N_2}$ and the normal at $v_3$ is $v_1$ is $\vec{N_3}$).

Interpolation inside the triangle

Image taken from the paper Hardware Implementation of Phong Shading using Spherical Interpolation from the year 2000.

We are going to interpolate across the side $\overline{v_1v_2}$ to get $\vec{N_{start}}$ like this:

$\vec{N_{start}} = \vec{N_1} + (\vec{N_2} - \vec{N_1})\frac{y - y_1}{y_2 - y_1}$

Similarly we are going to interpolate across the side $\overline{v_1v_3}$ to get $\vec{N_{end}}$ like this:

$\vec{N_{end}} = \vec{N_1} + (\vec{N_3} - \vec{N_1})\frac{y - y_1}{y_3 - y_1}$

And finally we can interpolate those results to get $\vec{N}$ like this:

$\vec{N} = \vec{N_{start}} + (\vec{N_{end}} - \vec{N_{start}})\frac{x - x_{start}}{x_{end} - x_{start}}$

Note:

  • The normal computed by interpolation is not normalized.
  • This same approach can be used to interpolate other attributes. Including the 3D positions corresponding to the pixel.

Phong Shading and the pipeline

We are looking at a quite old text (it is dated as "Fall 1997" as you can see in here), and the time there was no hardware support for Phong lighting. It was not part of the classic pipeline. Hence the text saying:

Phong not normally implemented in hardware

That is, at the time there was no way to get hardware acceleration for Phong light computations. And thus we would had to implement it in software running on the CPU.

However, graphics cards with Phong lighting support began appearing in the market soon after. These graphic cards had support for a limited number of lights and would do the light computation in hardware.

However however, the modern graphic pipeline is programable. So we are back at implementing light computation in software. Except this time it is in software that will run on the GPU.


Phong in hardware

So, there was a time when there were dedicated parts of the graphic card to do the lighting computation. When in the pipeline and how did Phong shading in the old graphic card? It was done as part of rasterización. With the caveat that the positions of the lights (which were limited in number) had to be given before hand.

The hardware would interpolate (as described above) the positions of the vertex in 3D to get the 3D position for the pixel; and also the hardware would interpolate the normals of the vertex in 3D to get the 3D normal for the pixel.

With the position (in 3D) and normal (in 3D) and the positions of the lights (in 3D), then Phong computation is possible.

See also Design Principles of Hardware-based Phong Shading and Bump Mapping from 1997.


Phong in shaders

In the modern programmable pipeline, the implementation spans three stages, two of which are programmable via shaders:

  • The vertex shader, which executes per vertex.
  • The fragment shader, which executes per pixel.

These shaders are software that we upload to the GPU, and the GPU executes it. There are other types of shaders, but we only need these two. We could put whatever logic we want there. I'm basing this part of the answer on a source reference I'll link at the end, which demonstrate Phong shading. But don't forget that shaders could be used very differently.

Before we get into what each shader does, I want to point out that there are some "uniform" variables. These uniforms do not change per vertex nor per pixel, and are available in all stages. I will not list them as inputs below.

The uniform variables that we will use are:

  • Transformation matrices (used in the vertex shader):
    • $M_{model}$: The matrix to transform from model space to world space.
    • $M_{view}$: The matrix to transform from world space to camera space.
    • $M_{projection}$: The matrix to transforms from camera space to clip space.
  • Object color (used in the fragment shader):
    • $C_{object}$: the color of the object. Note: it is also possible to do vertex coloring, or use texture color... here we are using a single uniform color.
  • Camera information (used in the fragment shader):
    • $V$: the camera position (in 3D, world coordinates).
  • Light information (used in the fragment shader):
    • $L$: The light position (in 3D, world coordinates).
    • $C_l$: The color of the light.

The matrices are actually 4x4, to do the transformations we augment the 3D vectors with a fourth component $w$. That is, we would be working with homogeneous coordinates. The computations below do not include that detail. I'll link the source reference I'm using at the end, you can see more detail there.

See also The view matrix finally explained.


Vertex shader

The vertex shader will run per vertex. It takes the following input:

  • $P_{vm}$: The position (in 3D, model space) of the vertex. This is an standard input.
  • $N_{vm}$: The normal (in 3D, model space) of the vertex. This is a custom input.

And it has the following outputs:

  • $P_{vw}$: The position (in 3D, world space) of the vertex. This is a custom output.
  • $N_{vw}$: The normal (in 3D, world space) of the vertex. This is a custom output.
  • $p_v$: The position (projected) of the vertex. This is an standard output (We will output homogeneous coordinates, perspective divide happens in the next stage).

And it does this computations:

  • $P_{vw} = M_{model} * P_{vm}$
  • $N_{vw} = M_{model} * N_{vm}$
  • $p_v = M_{projection} * M_{view} * P_{vw}$

Intermediary stage

This stage is not programmable. The following things happen between vertex shader and fragment shader:

  • Face culling: If configured to do so, the GPU will remove triangles that are facing away from the camera (back-face culling) or towards the camera (front-face culling) or neither.
  • Clipping: The GPU will discard regions that fail the following checks:
    • $-w <= x <= w$: The point was too far to the right or to the left to be visible.
    • $-w <= y <= w$: The point was too far up or down to be visible.
    • $-w <= z <= w$: The point was too near or far from the camera to be visible. This also remove points behind the camera.
  • Perspective divide: The GPU divides the $x$, $y$, and $z$ coordinates by $w$. This takes the point from clip space to normalized device space. The computed $z$ tells us depth of the fragment, unless the fragment shader overwrites it (which we are not going to do).
  • Viewport transformation: The GPU will transform the coordinates from normalized device space to screen space. Now the GPU knows the coordinates in pixels.
  • Interpolation. All the outputs given by the vertex shader are interpolated (by the means described above). These will be inputs for the fragment shader.

Fragment shader

The fragment shader will run per pixel. Actually it can execute multiple times for each pixel in the output. This is because the pixel might be inside the projection of multiple triangles. Each instance of the pixel inside a projection of a triangle is a fragment.

It takes as inputs the results of the interpolation of the outputs of the vertex shader:

  • $P_{pw}$: The position (in 3D, world space) of the pixel (computed by interpolating $P_{vw}$). This is a custom input.
  • $N_{pw}$: The normal (in 3D, world space) of the pixel (computed by interpolating $N_{vw}$). This is a custom input.
  • $p_p$: The position of the pixel (in normalized device space). This is an standard input, but are not going to use it.

And it has the following outputs:

  • $C$: The color for the pixel. This is an standard output.

And it does this computations:

  • Normalized normal:
    • $n = N_{pw}/|N_{pw}|$
  • Direction of the light:
    • $l = (L - P_{pw})/|L - P_{pw}|$
  • Direction of view:
    • $v = (C - P_{pw})/|C - P_{pw}|$
  • Direction of reflection:
    • $r = n * 2(l·n) - l$
  • Ambient component:
    • $C_{ambient} = S_{ambient} * C_l$
  • Diffuse component:
    • $C_{diffuse} = max(n·l, 0) * C_l$
  • Specular component:
    • $C_{specular} = S_{specular} * decay(v, r) * C_l$
  • Output:
    • $C = (C_{ambient} + C_{diffuse} + C_{specular}) * C_{object}$

Where $max$ is a function that returns the greater of its inputs:

$max(a,b) = \begin{cases} a, & \text{if $a >= b$} \\ b, & \text{if $b > a$} \end{cases}$

Also, my source reference is using a custom formula I called here $decay$:

$decay(v, r) = max(v·r, 0)^{32}$

And the values $S_{ambient} = 0.1$ and $S_{specular} = 0.5$ which control the blending of the light components.

The decay formula and these blending strengths capture the "material" of the object.


What happens after?

There are a series of checks and blending related to depth and transparency that are performed across the fragments to decide the final color of the pixel. In our case, we are assuming opaque geometry, so the only relevant check is the depth test. Thus, for each pixel the GPU will pick the fragment that correspond to a position closer to the camera and discard any other fragment. Finally the decided pixel colors can be pushed to the render target (the screen, for example).


Source reference

The steps I described for the vertex and shader fragment in this answer are based on this code:

  • vertex shader
  • fragment shader

Which are from the article: Basic Lighting.