How can a processor be made faster with a "software" update?

The new AMD processors Ryzen are out on the market and unfortunately they seem to lack in gaming. Intel processors are obviously still faster for gaming. People of course weren't in favour of this and AMD made a statement saying that they're updating the firmware or software for speeding up Ryzen for games.

Correct me if I'm wrong but the processor must have a static firmware to be compatible to the x64 standard? If so wouldn't AMD have to optimize the games(software) for their processors which is obviously impossible? What is going on there to say such things if it seems like it's impossible seeing the underlying conditions?

But I may be wrong (I hope so) so Question is:

Is it possible to achieve a speedup for a processor without changing the underlying hardware? If it's possible, how is this done? Is there software beside the firmware for the processor?


Solution 1:

Modern processors are much more complicated than one might think. They are incredibly complicated, nearly beyond the comprehension of a single person. Expanding on a short remark by "horta", one might have all of the following:

  1. To begin with, almost all internal hardware is configurable to a large extent. There are thousands and thousands of configuration registers with zillions of individual bits that must be set for a CPU to operate. All the several layers of CPU-cache interaction have configurable pipelines, and various timing delays obviously have an effect on overall performance.

  2. There are hundreds of advanced features that are put there by architects that engineering didn't have time to validate, so hundreds of features are disabled or set into fail-safe configuration with "chicken bits". But they can be tuned up and enabled if found functional and useful. These enhancing features usually get validated in-depth over time, and can be gradually enabled over the life of a processor by various microcode patches.

  3. All recent CPUs have several internal units that are controlled by independent microprocessors that are embedded inside the x86 CPU chip. One publicly surfaced unit is the P-Unit. Modern processors can't function without aggressive power management, or they will melt. However, deep power management is in contradiction to an aggressive entry-exit clock/voltage policy, and changes in the policy strongly affect overall system performance. All details are controlled by the P-Unit, and can be optimized/tuned by loading another microcode patch, as answered here.

  4. Many other aspects of internal interconnection are controlled by various additional embedded processors, which can be corrected by loading microcode patches into them, or upgrading the BIOS if it has an access to these configuration resources.

In short, while the CPU hardware is indeed hardwired, the configuration of the said hardware pretty much defines its performance, and can be tuned for better system performance by means of BIOS updates and embedded microcode patches.

Solution 2:

A variety of software issues at both the operating system and appplication levels are causing suboptimal performance.

Eight-core Ryzen processors consist of two core complexes (CCXs) each with four cores and 8 MB of L3 cache. Accessing the L3 cache on a different CCX is slower due to the need to move data farther along the Infinity Fabric (AMD's proprietary cache-coherent interconnect), which means that moving threads from one CCX to another or communicating between cores on different CCXs results in reduced performance. This penalty is reminiscent of what would happen on a multi-socket server running an operating system that is not NUMA-aware.

It appears Windows has a tendency to simply move processes around different cores and does not recognize the inter-CCX communication penalty. This means that Windows may put threads on different CCXs even when doing so is not necessary, reducing performance.

German reviewer PC Games Hardware tested a Ryzen 7 1800X with varying numbers of cores disabled and found that having two CCXs with two cores enabled on each produces slower performance than having one CCX enabled with all of its cores intact:

PC Games Hardware benchmark result: Battlefield 1

This is despite the fact that having only one CCX enabled means that only 8 MB of L3 cache is available, rather than the full 16 MB. From the PC Games Hardware article, via Google Translate (emphasis added):

[...] It is much more exciting to recognize the fact that the CCX data transfers interfere with each other in any case - sometimes more, sometimes less clearly. The advantages of the larger L3 cache (2 + 2 configuration) are nowhere within this measurement series.

This PC Perspective article shows that there is substantial communication latency between cores on different CCXs, but very little latency when communicating with cores on the same CCX. Note that this article suggests Windows is aware of the CCX design and is avoiding scheduling threads on different CCXs but conflicting results have been found by community members.


Many (but not all) gaming benchmarks show increased performance when SMT is disabled. However, rumors that state that Windows improperly schedules as if every hardware thread as its own core are incorrect. According to AMD, this is caused by the fact that many apps are optimized only for Intel processors (which is not unexpected given that AMD has been MIA from the high-end processor market for some five years). AMD says that they're working with hundreds of game developers to improve performance on Ryzen processors. However, I suspect that a Windows update can still help by more finely tuning the scheduler for the characteristics of the Zen architecture.

The following example from Tom's Hardware demonstrates degraded performance with SMT enabled:

Tom's Hardware benchmark result: Ashes of the Singularity

TechSpot did an in-depth analysis of gaming performance with SMT enabled and disabled and came up with very similar results:

TechSpot benchmark result: Deus Ex: Mankind Divided


There are also power management issues implicated in these tests. The same AMD statement notes that Windows is not fully aware of new features in Ryzen such as core parking and fast clock frequency switching. This is consistent with performance improvements reported by Tom's Hardware when using the High Performance power plan (see chart above, HP = High Performance). An update to Windows can add support for these features and improve performance.

Solution 3:

http://wccftech.com/amd-ryzen-launch-aftermath-gaming-performance-amd-response/

1) Early motherboard BIOSes were certainly troubled: disabling unrelated features would turn off cores. Setting memory overclocks on some motherboards would disable boost. Some BIOS revisions would plain produce universally suppressed performance.

2) Ryzen benefits from disabling High Precision Event Timers (HPET). The timer resolution of HPET can cause an observer effect that can subtract performance. This is a BIOS option, or a function that can be disabled from the Windows command shell.

3) Ryzen benefits from enabling the High Performance power profile. This overrides core parking. Eventually we will have a driver that allows people to stay on balanced and disable core parking anyways. Gamers have been doing this for a while, too. I misspoke, here. I want to clarify the benefit: High Performance mode allows the CPU to update its voltage/clockspeed in 1ms, vs. the 30ms that it takes balanced mode. This is what our driver will accomplish. Apologies for the confusion!

So the real question isn't how can a BIOS/firmware update make a processor faster (it can do, optimisation of microcode etc...) its more... how crippled is a CPU by buggy BIOS/firmware