How do multi-core CPUs work?
Solution 1:
A transistor can't "work on a problem". It's a basic building block of CPU's, useless on its own, but required to build logic gates (which can then compute simple operations like addition, etc...). There's a lot of hardware in a single core, more than just one transistor.
There's also much more in a CPU than just "doing stuff". There's a virtual memory manager, a hardware cache manager, various interfaces to connect the CPU to the motherboard and to the system memory, etc... Multicore CPU's often share a lot of hardware inside the actual CPU.
A "program" is a software concept - the CPU doesn't know what it is. All the CPU does is execute operations sent by the operating system. In this sense, a single-core CPU can only perform one logic operation at a time. But you still are able to do multiple stuff at the same time even on a single-core processor because the operating system switches the program which is currently running at a very fast rate. Multicore CPU's allow you to run more than one task at the same time, which can be exploited by the operating system by allowing you to run more programs at the same time comfortably, or having one program take advantage of multiple cores to run faster.
Technically, a "program" is a process divided into one or more threads, each of which are independent in their execution, they all have their own stack, CPU context (registers, etc..) and other stuff, though they can still communicate between each other within the process, obviously.
Solution 2:
The classical cycle that CPU's follow is:
Fetch Instruction -> Decode It -> Execute It
Reason being is that most problems that people have historically wanted to solve with computers have involved following a number of steps, one at a time and in order, where the result of some of those steps may affect later steps. With such problems, jumping around and attacking it from the middle using multiple "workers" doesn't work so well. So this model has served those types of programs well, which are really still very common. (That is, until 3D graphics rendering became common...)
The above model has been optimized and modified over the years, of course. And work has progressed in making sure less of the CPU is idle as time goes on. Even as early as the 68000 you had "pipelining", where multiple instructions are actually "in-flight" in various parts of the CPU (and this is why branch prediction was developed, because if you have multiple instructions pipelined, and then have to throw away the results because of a branch, you lose performance). Today you have additional things that prevent the CPU from stalling or waiting for something, like:
- cache (prevents CPU from having to wait on slow memory sometimes)
- out-of-order execution (rearranges fetched instructions into an order that executes more efficiently)
- register renaming (allows out-of-order execution to work better, by giving instructions their own copies of registers while other instructions finish their work)
So, each processor or core contains a number of subsystems that work together to interpret and execute an instruction stream. In a sense parts of modern CPUs are indeed working on something and other parts are working on something else, at the same time.
But, While they can be made very efficient using the above techniques, ultimately they are all working together on the same instruction stream. So they cannot be totally independent from each other. If you want to execute two instruction streams at once, you need two CPUs or cores.
A modern multitasking OS is bouncing between various instruction streams stored (i.e. programs) in memory. What the OS does is cut off the program when it takes up too much of a time slice (most CPUs designed for multitasking environments have a timer that causes an IRQ after a certain interval, or such similar mechanism), or switch over to another task if the process is waiting on some type of I/O or input. It never physically executes two instructions at once on a single CPU.
I think something like the idea you are talking about was tried with the Itanium and it's VLIW architecture. Read that Wikipedia article, it explains things a bit better and more in-depth than I'm trying to here.
Solution 3:
Actually "switching tasks" part is performed by an OS. Processor is a relatively "dumb" piece of hardware that just "crunches numbers". From a technical point of view, processor can't work on more than one task, because all tasks are written in assumption that they have full control of a processor at the time of execution. This is partially a legacy because of required backward compatibility.
With a multi-core processor more than one "full processor" available, so more than one program can have "full processor" available at the moment, so they can be executed simultaneously.
Solution 4:
The OS could only run ONE thread of a process on ONE core at the SAME time, but once you know that one process is divided into multiple threads by the OS before they can run on the Core, and any modern OS is normally running around 100 or more processes, you will be able to imagine how fast the switching happens, if the Core is clocked at 1 GHz it will refresh around a billion times per second to free up "space" for the upcoming threads to execute.
Recently, in Intel processors, you may have heard of their Multi-Threading technology, which makes ONE core be ABLE to run TWO threads at the exact SAME time, theoretically doubling the Core performance.