How can I have 1805 threads when I only have 4 virtual CPUs?

I was wondering if someone could explain to me how in my Activity Monitor it says that I currently have 1805 threads Screen shot of OS X Activity Monitor

But I only have 4 virtual cores on my computer (which means that I should only be able to have 4 threads). Does the thread count mean all the threads that are being handled by the CPUs when they are deciding which thread to execute?

EDIT: The reason I think that there can only be 4 threads on my machine comes from this answer. I believe that my misunderstanding stems from the word 'thread' being used in a different context.


Scheduling

Your 1,805 threads do not run simultaneously. They trade off. One core runs a bit of the thread, then sets it aside to execute a bit of another thread. The other cores do the same. Round and round, threads execute a little bit at a time, not all at once.

A major responsibility of the operating system (Darwin and macOS) is the scheduling of which thread is to be executed on which core for how long.

Many threads have no work to do, and so are left dormant and unscheduled. Likewise, many threads may be waiting on some resource such as data to be retrieved from storage, or a network connection to be completed, or data to be loaded from a database. With almost nothing to do but check the status of the awaited resource, such threads are scheduled quite briefly if at all.

The app programmer can assist this scheduling operation by sleeping her thread for a certain amount of time when she knows the wait for the external resource will take some time. And if running a “tight” loop that is CPU-intensive with no cause to wait on external resources, the programmer can insert a call to volunteer to be set aside briefly so as to not hog the core and thereby allow other threads to execute.

For more detail, see the Wikipedia page for multithreading.

Simultaneous Multi-Threading

As for your linked Question, threads there are indeed the same as here.

One issue there is the overhead cost of switching between threads when scheduled by the OS. There is a significant cost in time to unload the current thread’s instructions and data from the core and then load next scheduled thread ’s instructions and data. Part of the operating system’s job is to try to be smart in scheduling the threads as to optimize around this overhead cost.

Some CPU makers have developed technology to cut this time so as to make switching between a pair of threads much faster. Intel calls their technology Hyper-Threading. Known generically as simultaneous multi-threading (SMT).

While the pair of threads does not actually execute simultaneously, the switching is so smooth and fast that both threads appear to be virtually simultaneous. This works so well that the each core presents itself as a pair of virtual cores to the OS. So a SMT enabled CPU with four physical cores, for example, will present itself to the OS as an eight-core CPU.

Despite this optimization, there is still some overhead to switching between such virtual cores. Too many CPU-intensive threads all clamoring for execution time to be scheduled on a core can make the system inefficient, with no one thread getting much work done. Like three balls on a playground being shared between nine kids, versus sharing between nine hundred kids where no one kid really gets any serious playtime with a ball.

So there is an option in the CPU firmware where a sysadmin can throw a switch on the machine to disable SMT if she decides it would benefit her users running an app that is unusually CPU-bound with very few opportunities for pausing.

In such a case we come back to your original Question: In this special situation you would indeed want to constrain the operations to have no more of these hyper-active threads than you have physical cores. But let me repeat: this is an extremely unusual situation that might occur in something like a specialized scientific data-crunching project but would almost never apply to common business/corporate/enterprise scenarios.


In the old days - memory wasn't virtualized or protected and any code could write anywhere. In those days a one thread to one CPU design made sense. In the decades since that, memory first was protected and then virtualized. Think of threads as virtual cores - kind of a promise that at some time when your data and code was ready, that thread gets pushed (or scheduled as the PHD engineers and mathmeticians that do research on scheduling algorithms call it) onto an actual CPU to do actual work.

  • https://people.eecs.berkeley.edu/%7Ercs/research/interactive_latency.html

enter image description here

Now - due to the magnitudes of difference in timing - the CPU and cache operate so fast compared to getting data from storage or the network - that thousands of threads could come and go while one thread is waiting for www.google.com to deliver a packet or two of data so that's the reason why you see so many more threads than actual CPU.

If you convert the thread operations which happen on the black / blue time scale and convert them to one second = 1 ns, things we care about are more like disk IO take 100 microseconds are like 4 days and a 200 ms internet round trip is a 20 year delay if you're counting seconds on the CPU time scale. Like many powers of ten exercises, in almost all cases - the CPU sits idle for "months" waiting for meaningful work from a very, very slow external world.

Nothing seems amiss in the image you posted so perhaps we're misunderstanding what you are getting at by wondering about threads.

If you right click (control click) on the word threads in the header row at top, add the status of the app and you will see most threads are likely idle, sleeping, not running at any given moment.


You don't ask the arguably more fundamental question, "How can I have 290 processes when my CPU only has four cores?" This answer is a bit of history, which might help you understand the big picture, even though the specific question has already been answered. As such, I'm not going to give a TL;DR version.

Once upon a time (think, 1950s–'60s), computers could only do one thing at a time. They were very expensive, filled whole rooms, and we needed a way to make efficient use of them by sharing them between multiple people. The first way of doing this was batch processing, in which users would submit tasks to the computer and they'd be queued up, executed one after another and the results would be sent back to the user. That was OK but it did mean that, if you wanted to do a calculation that was going to take a couple of days, nobody else could use the computer during that time.

The next innovation (think, 1960s–'70s) was time-sharing. Now, instead of executing the whole of one task, then the whole of the next one, the computer would execute a bit of one task, then pause it and execute a bit of the next one, and so on. Thus, the computer would give the impression that it was executing multiple processes concurrently. The great advantave of this is that now you can run a calculation that will take a couple of days and, although it will now take even longer, because it keeps getting interrupted, other people can still use the machine during that time.

All of this was for huge mainframe-style computers. When personal computers started to become popular, they initially weren't very powerful and, hey, since they were personal it seemed OK for them to only be able to do one thing&nbdp;– run one application – at once (think, 1980s). But, as they became more powerful (think, 1990s to present), people wanted their personal computers to time-share, too.

So we ended up with personal computers that gave the illusion of running multiple processes concurrently by actually running them one at a time for brief periods and then pausing them. Threads are essentially the same thing: eventually, people wanted even individual processes to give the illusion of doing multiple things concurrently. At first, the application writer had to handle that themself: spend a little while updating the graphics, pause that, spend a little while calculating, pause that, spend a little while doing something else, ...

However, the operating system was already good at managing multiple processes, it made sense to extend it to manage these sub-processes, which are called threads. So, now, we have a model where every process (or application) contains at least one thread, but some contain several or many. Each of these threads corresponds to a somewhat independent subtask.

But, at the top level, the CPU is still only giving the illusion that these threads are all running at the same time. In reality, it's running one for a little bit, pausing it, choosing another to run for a little bit, and so on. Except that modern CPUs can run more than one thread at once. So, in the real reality, the operating system is playing this game of "run for a bit, pause, run something else for a bit, pause" on all the cores simultaneously. So, you can have as many threads as you (and your application designers) want but, at any moment in time, all but a few of them will actually be paused.