How are 3D games so efficient? [closed]

Patience, technical skill and endurance.

First point is that a DX Demo is primarily a teaching aid so it's done for clarity not speed of execution.

It's a pretty big subject to condense but games development is primarily about understanding your data and your execution paths to an almost pathological degree.

  1. Your code is designed around two things - your data and your target hardware.
  2. The fastest code is the code that never gets executed - sort your data into batches and only do expensive operations on data you need to
  3. How you store your data is key - aim for contiguous access this allows you to batch process at high speed.
  4. Parellise everything you possibly can
  5. Modern CPUs are fast, modern RAM is very slow. Cache misses are deadly.
  6. Push as much to the GPU as you can - it has fast local memory so can blaze through the data but you need to help it out by organising your data correctly.
  7. Avoid doing lots of renderstate switches ( again batch similar vertex data together ) as this causes the GPU to stall
  8. Swizzle your textures and ensure they are powers of two - this improves texture cache performance on the GPU.
  9. Use levels of detail as much as you can -- low/medium/high versions of 3D models and switch based on distance from camera player - no point rendering a high-res version if it's only 5 pixels on screen.

In general, it's because

  1. The games are being optimal about what they need to render, and
  2. They take special advantage of your hardware.

For instance, one easy optimization you can make involves not actually trying to draw things that can't be seen. Consider a complex scene like a cityscape from Grand Theft Auto IV. The renderer isn't actually rendering all of the buildings and structures. Instead, it's rendering only what the camera can see. If you could fly around to the back of those same buildings, facing the original camera, you would see a half-built hollowed-out shell structure. Every point that the camera cannot see is not rendered -- since you can't see it, there's no need to try to show it to you.

Furthermore, optimized instructions and special techniques exist when you're developing against a particular set of hardware, to enable even better speedups.

The other part of your question is why a demo uses so much CPU:

... while a DX demo of a rotating Teapot @ 60fps uses a whopping 30% ?

It's common for demos of graphics APIs (like dxdemo) to fall back to what's called a software renderer when your hardware doesn't support all of the features needed to show a pretty example. These features might include things like shadows, reflection, ray-tracing, physics, et cetera.

This mimics the function of a completely full-featured hardware device which is unlikely to exist, in order to show off all the features of the API. But since the hardware doesn't actually exist, it runs on your CPU instead. That's much more inefficient than delegating to a graphics card -- hence your high CPU usage.


3D games are great at tricking your eyes. For example, there is a technique called screen space ambient occlusion (SSAO) which will give a more realistic feel by shadowing those parts of a scene that are close to surface discontinuities. If you look at the corners of your wall, you will see they appear slightly darker than the centers in most cases.

The very same effect can be achieved using radiosity, which is based on rather accurate simulation. Radiosity will also take into account more effects of bouncing lights, etc. but it is computationally expensive - it's a ray tracing technique.

This is just one example. There are hundreds of algorithms for real time computer graphics and they are essentially based on good approximations and typically make a lot assumptions. For example, spatial sorting must be chosen very carefully depending on the speed, typical position of the camera as well as the amount of changes to the scene geometry.

These 'optimizations' are huge - you can implement an algorithm efficiently and make it run 10 times faster, but choosing a smart algorithm that produces a similar result ("cheating") can make you go from O(N^4) to O(log(N)).

Optimizing the actual implementation is what makes games even more efficient, but that is only a linear optimization.