Why multi core processors?
Solution 1:
The trend towards multiple cores is an engineering approach that helps the CPU designers avoid the power consumption problem that came with ever increasing frequency scaling. As CPU speeds rose into the 3-4 Ghz range the amount of electrical power required to go faster started to become prohibitive. The technical reasons for this are complex but factors like heat losses and leakage current (power that simply passes through the circuitry without doing anything useful) both increase faster as frequencies rise. While it's certainly possible to build a 6 GHz general purpose x86 CPU, it's not proven economical to do so efficiently. That's why the move to multi-core started and it is why we will see that trend continue at least until the parallelization issues become insurmountable. At the moment the trend towards virtualization has helped in the server arena as that allows us to parallelize aggregate workloads efficiently, for the moment at any rate.
As a practical example the E5640 Xeon (4 cores @ 2.66 GHz) has a power envelope of 95 watts while the L5630 (4 Cores @ 2.13 GHz) requires only 40 watts. That's 137% more electrical power for 24% more CPU power for CPU's that are for the most part feature compatible. The X5677 pushes the speed up to 3.46 GHz with some more features but that's only 60% more processing power for 225% more electrical power.
Now compare the X5560 (2.8 GHz, 4 cores, 95 watts) with the newer X5660 (2.8 GHz, 6 cores, 95 watts) and there's 50% extra computing power in the socket (potentially, assuming that Amdahl's law being kind to us for now) without requiring any additional electrical power. AMD's 6100 series CPU's see a similar gains in aggregate performance over the 2400\8400 series while keeping electrical power consumption flat.
For single-threaded tasks this is a problem but if your requirements are to deliver large amounts of aggregate CPU power to a distributed processing cluster or a virtualization cluster then this is a reasonable approach. This means that for most server environments today scaling out the number of cores in each CPU is a much better approach than trying to build faster\better single core CPU's.
The trend will continue for a while but there are challenges and continually scaling out the number of cores is not easy (keeping memory bandwidth high enough and managing caches gets much harder as the number of cores grows). That means that the current fairly explosive growth in the number of cores per socket will have to slow down in a couple of generations and we will see some other approach.
Solution 2:
It was getting too hard to make them usefully faster.
The problem being, is that you need to be working on a bunch of instructions at once, current x86 cpu have 80 or more instructions being worked on at once, and it seems that is the limit, as it was hit with the P4, heck, the Pentium Pro did 40 in 1995. Typical instruction streams are not predictable beyond that (you have to guess branches, memory access, etc) to make execute more than a few instructions at once (486 did 5, Pentium did 10, barely).
So while you can make them wider (more functional units to do each piece of the instruction), longer (deeper pipelines to hide latency), it doesn't seem to do much good. And we seem to have hit a wall with clock speed as well. And we are still outrunning memory. So splitting into many cpu seems to be a win. Plus, they can share caches.
There is quite a bit more to this, but it boils down to conventional programs cannot be run significantly faster on any hardware we can imagine how to design and build.
Now if predictability isn't a problem, for example, many scientific problems and graphics (they often boil down to multiply this set of numbers by that set of numbers), this isn't the case, and thus the popularity of Intel's IA64 (Itanium) and GPUs, that just keep getting faster, but they will not help you run Word any better.
Solution 3:
The computing power and clock frequency of a single processor reached their peak a few years ago, it just isn't easy to create more powerful and/or faster processors than the current ones; so the major CPU manufacturers (Intel, AMD) switched strategy and went multi-core. This of course requires a lot more work from the application developers in order to harness the full power of multi-tasking: a program running on a single task just doesn't get any benefit from a multi-core CPU (although the system gets an overall bonus because it doesn't lock if a single process takes a single CPU to 100% usage).
About the physical architecture (multi-core processors instead of multiple single-core ones)... you should ask Intel. But I'm quite sure this has something to do with motherboards with a single CPU socket being a lot easier to design and manufacture than boards with multiple ones.
Solution 4:
In order to increase clock speeds, the silicon transistors on the chip need to be able to switch faster. These higher speeds require higher input voltages and semiconductor manufacturing processes that result in greater leakage, both of which increase power consumption and heat output. You eventually reach a point where you cannot increase clock rates any further without requiring excessive amounts of power or using exotic cooling solutions.
To illustrate this problem, I'll compare two modern AMD processors. The AMD FX-9590 is capable of attaining clock speeds of up to 5 GHz out of the box, but operates at core voltages up to 1.912 V, which is extremely high for a 32nm chip, and dissipates an insane 220 watts of heat. The FX-8350, which is based on the same die, runs at a maximum of 4.2 GHz but operates at a maximum of 1.4 V and dissipates 125 watts.
As a result, instead of trying to increase clocks further, engineers have sought to make chips do more work faster in other ways, including designing them to run multiple processes simultaneously—hence multi-core processors.