Once upon a time, when > was faster than < ... Wait, what?
I am reading an awesome OpenGL tutorial. It's really great, trust me. The topic I am currently at is Z-buffer. Aside from explaining what's it all about, the author mentions that we can perform custom depth tests, such as GL_LESS, GL_ALWAYS, etc. He also explains that the actual meaning of depth values (which is top and which isn't) can also be customized. I understand so far. And then the author says something unbelievable:
The range zNear can be greater than the range zFar; if it is, then the window-space values will be reversed, in terms of what constitutes closest or farthest from the viewer.
Earlier, it was said that the window-space Z value of 0 is closest and 1 is farthest. However, if our clip-space Z values were negated, the depth of 1 would be closest to the view and the depth of 0 would be farthest. Yet, if we flip the direction of the depth test (GL_LESS to GL_GREATER, etc), we get the exact same result. So it's really just a convention. Indeed, flipping the sign of Z and the depth test was once a vital performance optimization for many games.
If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a <
comparison to a >
comparison. So, if I understand correctly and the author isn't lying or making things up, then changing <
to >
used to be a vital optimization for many games.
Is the author making things up, am I misunderstanding something, or is it indeed the case that once <
was slower (vitally, as the author says) than >
?
Thanks for clarifying this quite curious matter!
Disclaimer: I am fully aware that algorithm complexity is the primary source for optimizations. Furthermore, I suspect that nowadays it definitely wouldn't make any difference and I am not asking this to optimize anything. I am just extremely, painfully, maybe prohibitively curious.
Solution 1:
If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.
I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.
However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not operator<
.
What I was referring to was a specific old optimization where one frame you would use GL_LESS
with a range of [0, 0.5]. Next frame, you render with GL_GREATER
with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.
This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.
Solution 2:
The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day. Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.