Is discard bad for program performance in OpenGL?

I was reading this article, and the author writes:

Here's how to write high-performance applications on every platform in two easy steps:
[...]
Follow best practices. In the case of Android and OpenGL, this includes things like "batch draw calls", "don't use discard in fragment shaders", and so on.

I have never before heard that discard would have a bad impact on performance or such, and have been using it to avoid blending when a detailed alpha hasn't been necessary.

Could someone please explain why and when using discard might be considered a bad practise, and how discard + depthtest compares with alpha + blend?

Edit: After having received an answer on this question I did some testing by rendering a background gradient with a textured quad on top of that.

  • Using GL_DEPTH_TEST and a fragment-shader ending with the line "if( gl_FragColor.a < 0.5 ){ discard; }" gave about 32 fps.
  • Removing the if/discard statement from the fragment-shader increased the rendering speed to about 44 fps.
  • Using GL_BLEND with the blend function "(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)" instead of GL_DEPTH_TEST also resulted in around 44 fps.

It's hardware-dependent. For PowerVR hardware, and other GPUs that use tile-based rendering, using discard means that the TBR can no longer assume that every fragment drawn will become a pixel. This assumption is important because it allows the TBR to evaluate all the depths first, then only evaluate the fragment shaders for the top-most fragments. A sort of deferred rendering approach, except in hardware.

Note that you would get the same issue from turning on alpha test.


"discard" is bad for every mainstream graphics acceleration technique - IMR, TBR, TBDR. This is because visibility of a fragment(and hence depth) is only determinable after fragment processing and not during Early-Z or PowerVR's HSR (hidden surface removal) etc. The further down the graphics pipeline something gets before removal tends to indicate its effect on performance; in this case more processing of fragments + disruption of depth processing of other polygons = bad effect

If you must use discard make sure that only the tris that need it are rendered with a shader containing it and, to minimise its effect on overall rendering performance, render your objects in the order: opaque, discard, blended.

Incidentally, only PowerVR hardware determines visibility in the deferred step (hence it's the only GPU termed as "TBDR"). Other solutions may be tile-based (TBR), but are still using Early Z techniques dependent on submission order like an IMR does. TBRs and TBDRs do blending on-chip (faster, less power-hungry than going to main memory) so blending should be favoured for transparency. The usual procedure to render blended polygons correctly is to disable depth writes (but not tests) and render tris in back-to-front depth order (unless the blend operation is order-independent). Often approximate sorting is good enough. Geometry should be such that large areas of completely transparent fragments are avoided. More than one fragment still gets processed per pixel this way, but HW depth optimisation isn't interrupted like with discarded fragments.