In Prime95, why do small FFTs generate the most heat, despite CPU at 100% for all options?

I've just built a new Skylake PC, and I'm going to see about a bit of overclocking with Prime95 as a stress tester.

It works fine in normal use, but with Prime95 I'm noticing a bit of CPU throttling under certain loads.

If all 4 cores (8 threads) are jammed up at 100% regardless, why does the Small FFT setting in Prime95 get to a higher temperature than the 'Blend' option?


Solution 1:

Vectorized code, especially AVX, naturally increases CPU heat output because the processor must operate at a higher voltage to execute these instructions. Small FFTs demand less memory than the blend mode, so the processor spends more time processing data and less time waiting for data.

  • The x86-64 architecture provides extensive vector processing capabilities, especially on the latest processors. Vector processing allows applications to perform mathematical operations on multiple data items at once, and is used by many newer computationally-intensive applications to increase processing throughput.

  • Vectorized code, especially the AVX instructions used by Prime95, requires the processor to operate at a higher voltage than normal. This results in power consumption and heat output greater than what is experienced under normal workloads. For this reason, Intel warns that AVX-heavy loads can cause the processor to throttle or not sustain full Turbo Boost clock rates (footnote 1):

    Intel® Advanced Vector Extensions (Intel® AVX) are designed to achieve higher throughput for certain integer and floating point operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies.

    Intel explains this in further detail in this white paper. In particular, it notes:

    Intel AVX is designed to achieve higher throughput for certain integer and floating-point operations. Using these instructions may cause processors to operate at less than the marked TDP frequency. These reductions in frequency occur because high-power Intel AVX instructions require additional voltage and electrical current.

    • My guess as to why boosting Vcore is necessary for AVX instructions is that the AVX execution units are more complex than the other parts of the processor, resulting in corresponding pipeline stages that take longer to complete (see this answer for more technical information on pipelines and other aspects of processor design). If a particular pipeline stage is slow, the entire processor's maximum clock rate is limited as every stage in the pipeline must finish within each clock cycle.

    • For the same reason higher voltages increase maximum attainable frequencies when overclocking (transistors can switch faster at higher voltages), increasing voltage helps ensure that the longer pipeline stages can finish on time.

  • The Small FFT mode uses only smaller data items which can fit in the CPU cache, unlike the Blend mode which operates on both small and large values which may not fit in cache. Because accessing memory is slow relative to simply processing data, the processor will be spending less time actually processing data in Blend mode, reducing heat output. Small FFTs do not entail anywhere near as many memory accesses, resulting in more actual work for the CPU to perform, thereby increasing power consumption and heat output.