Logical vs. Physical CPU performance
Solution 1:
The concept of cores is not that simple. Logical cores are the number of Physical cores times the number of threads that can run on each cores. This is known as HyperThreading. If I have a computer that has a 4-core processor, runs two threads per core, then I have a 8 logical processors. You can see your computers core capabilities by running lscpu command.
If a processor has 4 cores, but it can run 8 threads in parallel, means that it only has 4 physical cores (processing units). But its hardware can support up to 8 threads in parallel. Clearly maximum of 4 jobs can run in the cores. One job running in the core, if by any means stalls for memory or I/O operation then another thread can use that free core.
You should now understand that if your computer has 2 physical cores, and can run 2 threads per core, then you have 4 logical processors. So you can run only 2 instances as you have 2 physical cores, that means you're using the full capabilities of single physical cores (2 threads at a time). So the throughput will be 50%. But if anytime one thread goes idle, then the core can load up one thread on that core.
You can turn off the HyperThreading in BIOS(something like "Intel ht technology") and see the difference between normal and HyperThreading capabilities as now the throughput will be 100%.
Solution 2:
Even with many more cores than tasks, they won’t scale perfectly. That’s because some state is almost always shared. Not necessarily in the task, but the kernel, for example. Or they may access the same resource, like the network or a disk or whatever.
SMT (ie. Hyper-Threading) may rely on the fact that different tasks use different CPU execution units. As such, so-called “Instruction-level parallelism” can be achieved on superscalar CPUs. Virtually any modern x86 processor is superscalar.
Assuming you have two tasks that only consists of adding numbers with no other CPU instructions, then yes, they will conflict when running on the same physical core, possible leading to significant performance degradation.
However, most of the time, this isn’t the case and a variety of stuff happens. As long as the same command doesn’t appear on both instruction streams at (roughly) the same time, CPU execution unit utilization can be improved.