Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?
In 2014, I hear a lot of programming languages touted for their concurrency features. Concurrency is said to be crucial for performance gains.
In making this statement, many people point back to a 2005 article called The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. The basic argument is that it's getting harder to increase the clock speed of processors, but we can still put more cores on a chip, and that to get performance gains, software will need to be written to take advantage of multiple cores.
Some key quotes:
We're used to seeing 500MHz CPUs give way to 1GHz CPUs give way to 2GHz CPUs, and so on. Today we're in the 3GHz range on mainstream computers.
The key question is: When will it end? After all, Moore's Law predicts exponential growth, and clearly exponential growth can't continue forever before we reach hard physical limits; light isn't getting any faster. The growth must eventually slow down and even end.
... It has become harder and harder to exploit higher clock speeds due to not just one but several physical issues, notably heat (too much of it and too hard to dissipate), power consumption (too high), and current leakage problems.
... Intel's and most processor vendors' future lies elsewhere as chip companies aggressively pursue the same new multicore directions.
...Multicore is about running two or more actual CPUs on one chip.
This article's predictions seem to have held up, but I don't understand why. I have only very vague ideas about how hardware works.
My oversimplified view is "it's getting harder to pack more processing power into the same space" (because of issues with heat, power consumption, etc). I would expect the conclusion to be "therefore, we'll have to have bigger computers or run our programs on multiple computers." (And indeed, distributed cloud computing is a thing we're hearing more about.)
But part of the solution seems to be multi-core architectures. Unless computers grow in size (which they haven't), this just seems to be another way of saying "pack more pocessing power into the same space".
Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?
Please explain in the simplest terms you can. :)
Summary
Economics. It's cheaper and easier to design a CPU that has more cores than a higher clock speed, because:
Significant increase in power usage. CPU power consumption increases rapidly as you increase the clock speed - you can double the number of cores operating at a lower speed in the thermal space it takes to increase the clock speed by 25%. Quadruple for 50%.
There's other ways to increase sequential processing speed, and CPU manufacturers make good use of those.
I'm going to be drawing heavily on the excellent answers at this question on one of our sister SE sites. So go upvote them!
Clock speed limitations
There are a few known physical limitations to clock speed:
-
Transmission time
The time it takes for an electrical signal to traverse a circuit is limited by the speed of light. This is a hard limit, and there is no known way around it1. At gigahertz-clocks, we are approaching this limit.
However, we are not there yet. 1 GHz means one nanosecond per clock tick. In that time, light can travel 30cm. At 10 GHz, light can travel 3cm. A single CPU core is about 5mm wide, so we will run into these issues somewhere past 10 GHz.2
-
Switching delay
It's not enough to merely consider the time it takes for a signal to travel from one end to another. We also need to consider the time it takes for a logic gate within the CPU to switch from one state to another! As we increase clock speed, this can become an issue.
Unfortunately, I'm not sure about the specifics, and cannot provide any numbers.
Apparently, pumping more power into it can speed up switching, but this leads to both power consumption and heat dissipation issues. Also, more power means you need bulkier conduits capable of handling it without damage.
-
Heat dissipation/power consumption
This is the big one. Quoting from fuzzyhair2's answer:
Recent processors are manufactured using CMOS technology. Every time there is a clock cycle, power is dissipated. Therefore, higher processor speeds means more heat dissipation.
There's some lovely measurements at this AnandTech forum thread, and they even derived a formula for the power consumption (which goes hand in hand with heat generated):
Credit to IdontcareWe can visualise this in the following graph:
Credit to IdontcareAs you can see, power consumption (and heat generated) rises extremely rapidly as the clock speed is increased past a certain point. This makes it impractical to boundlessly increase clock speed.
The reason for the rapid increase in power usage is probably related to the switching delay - it's not enough to simply increase power proportional to the clock rate; the voltage must also be increased to maintain stability at higher clocks. This may not be completely correct; feel free to point out corrections in a comment, or make an edit to this answer.
More cores?
So why more cores? Well, I can't answer that definitively. You'd have to ask the folks at Intel and AMD. But you can see above that, with modern CPUs, at some point it becomes impractical to increase clock speed.
Yes, multicore also increases power required, and heat dissipation. But it neatly avoids the transmission time and switching delay issues. And, as you can see from the graph, you can easily double the number of cores in a modern CPU with the same thermal overhead as a 25% increase in clock speed.
Some people have done it - the current overclocking world record is just shy of 9 GHz. But it is a significant engineering challenge to do so while keeping power consumption within acceptable bounds. The designers at some point decided that adding more cores to perform more work in parallel would provide a more effective boost to performance in most cases.
That's where the economics come in - it was likely cheaper (less design time, less complicated to manufacture) to go the multicore route. And it's easy to market - who doesn't love the brand new octa-core chip? (Of course, we know that multicore is pretty useless when the software doesn't make use of it...)
There is a downside to multicore: you need more physical space to put the extra core. However, CPU process sizes constantly shrink a lot, so there's plenty of space to put two copies of a previous design - the real tradeoff is not being able to create larger, more-complex, single cores. Then again, increasing core complexity is a bad thing from a design standpoint - more complexity = more mistakes/bugs and manufacturing errors. We seem to have found a happy medium with efficient cores that are simple enough to not take too much space.
We've already hit a limit with the number of cores we can fit on a single die at current process sizes. We might hit a limit of how far we can shrink things soon. So, what's next? Do we need more? That's difficult to answer, unfortunately. Anyone here a clairvoyant?
Other ways to improve performance
So, we can't increase the clock speed. And more cores have an additional disadvantage - namely, they only help when the software running on them can make use of them.
So, what else can we do? How are modern CPUs so much faster than older ones at the same clock speed?
Clock speed is really only a very rough approximation of the internal workings of a CPU. Not all components of a CPU work at that speed - some might operate once every two ticks, etc..
What's more significant is the number of instructions you can execute per unit of time. This is a far better measure of just how much a single CPU core can accomplish. Some instructions; some will take one clock cycle, some will take three. Division, for example, is considerably slower than addition.
So, we could make a CPU perform better by increasing the number of instructions it can execute per second. How? Well, you could make an instruction more efficient - maybe division now takes only two cycles. Then there's instruction pipelining. By breaking each instruction into multiple stages, it's possible to execute instructions "in parallel" - but each instruction still has a well-defined, sequential, order respective to the instructions before and after it, so it doesn't require software support like multicore does.
There is another way: more specialised instructions. We've seen things like SSE, which provide instructions to process large amounts of data at one time. There are new instruction sets constantly being introduced with similar goals. These, again, require software support and increase complexity of the hardware, but they provide a nice performance boost. Recently, there was AES-NI, which provides hardware-accelerated AES encryption and decryption, far faster than a bunch of arithmetic implemented in software.
1 Not without getting quite deep into theoretical quantum physics, anyway.
2 It might actually be lower, since electrical field propagation isn't quite as fast as the speed of light in a vacuum. Also, that's just for straight-line distance - it's likely that there's at least one path that's considerably longer than a straight line.
Physics is physics. We can't keep packing more transistors into ever smaller spaces forever. At some point it gets so small that you deal with weird quantum crap. At some point we can't pack twice as many transistors in a year as we used to (which is what moore's law is about).
Raw clockspeeds mean nothing. My old Pentium M was about half the clock speed of a contemporary desktop CPU (and yet in many respects faster) – and modern systems are barely approaching the speeds of systems 10 years ago (and are clearly faster). Basically 'just' bumping up the clockspeed does not give real performance gains in many cases. It may help in some singlethreaded operations, but you're better off spending the design budget on better efficiency in terms of everything else.
Multiple cores let you do two or more things at once, so you don't need to wait for one thing to finish for the next one. On the shorter term, you can simply pop two existing cores into the same package(for example with the Pentium Ds, and their MCM, which was a transitional design) and you have a system that's twice as fast. Most modern implementations do share things like a memory controller of course.
You can also build smarter in different ways. ARM does Big-Little – having 4 'weak' low power cores working alongside 4 more powerful cores so you have the best of both worlds. Intel lets you down throttle (for better power efficency) or overclock specific cores (for better single thread performance). I remember AMD does something with modules.
You can also move things like memory controllers (so you have lower latency) and IO related functions (the modern CPU has no north bridge) as well as video (which is more important with laptops and AIW design). It makes more sense to do these things than 'just' keep ramping up the clockspeed.
At some point 'more' cores may not work – though GPUs have hundreds of cores.
Multicores as such lets computers work smarter in all these ways.