Why do we use CPUs for ray tracing instead of GPUs?
I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).
Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:
GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)
Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.
But what's different about games? Why are GPUs good for games but not film?
First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.
Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.
With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.